CURMUDGUCATION

Monday, March 26, 2018

Big Brother Wants To Read Your Face

Imagine if you were presenting in front of a large group, and you could instantly get feedback on how you were doing. Perhaps you could read the body language of your audience or notice the expressions on their faces. Suppose you could check to see if your audience was understanding you, or following you, or happy or sad about what you were saying. Imagine that you-- oh, no, wait. You don't have to imagine that because you are a semi-intelligent functioning human being.

Let me start over.

Imagine you had computer software that could do all that for you.

A month ago, Inside Higher Ed reported on just such a chunk of software.

With sentiment analysis software, set for trial use later this semester in a classroom at the University of St. Thomas, in Minnesota, instructors don’t need to ask. Instead, they can glance at their computer screen at a particular point or stretch of time in the session and observe an aggregate of the emotions students are displaying on their faces: happiness, anger, contempt, disgust, fear, neutrality, sadness and surprise.

Well, that sounds... creepy? Unnecessary? Unlikely to be successful? Could you just mean in to your webcam for a second so I can see how this is going over?

I show the computer my finger and it doesn't know what the hell is happening

Maybe it's all the years I've spent as a hack musician, performing in front of all sorts of crowds, on top of all my years in a classroom, but I'm thinking that if you can't read the room then A) you might very well be a lousy teacher and B) you probably aren't nimble enough to respond to software that reads the room for you.

Python code captures video frames from a high-definition webcam and sends them to the Emotion interface, which determines the emotional state. Then that analysis comes to the Face interface, which returns the results, and draws a bounding box around the faces, along with a label for the given emotion. The Python code also sends the results to the PowerBI platform for visualization and retention.

Right. And just in case you think this could be helpful for someone teaching a huge class of 500, I'll note that right now the software tops out at 42.

There are technical issues, like "training" software to read emotions on human faces. Then there's the huge leap of logic that assumes that Pat's expression of disgust Then there's the well-documented problem of facial recognition software that can't recognize non-white faces. Then there's the problem of storing all that data and using it for other purposes, like evaluating teachers ("Sorry, Professor Bogswaddle, but your class turned up too many 'yuck' faces this semester")

Like many software developers of this ilk, the folks at St. Thomas’s E-Learning and Research Center (STELAR) have done some piloting, which itself raises questions. We're talking here about Eric Tornoe, associate director of research computing in the information technology services department, team leader on this project:

Tornoe, his assistant and a part-time student employee have served as the software's three main guinea pigs thus far. In the process of "pulling faces" to test different emotions, the team found that surprise and anger were the easiest to perform and detect, while contempt was the trickiest.

The team also road tested the technology with unsuspecting audiences at staff meetings and presentations, according to Tornoe. (A spokesperson for the university said the staff meeting audiences were prepared to see a presentation about sentiment analysis, so they weren't caught entirely off guard.)

I'm worried about a class in which surprise, anger and contempt are prevailing emotions. And there is a huge creepiness factor with trying this out with "unsuspecting" audiences. But then, I'm betting that unsuspecting audiences are the only ones for which this would have a hope in hell at working.

Because what do you suppose happens when you tell a bunch of students that software will be monitoring, analyzing, and reporting on their facial expressions to the teacher? What happens when we tell a student, "The computer will be watching you the whole time to see if you understand." I am willing to bet that close-to-zero students respond, "Fine. I'll just carry on and behave as if the computer was not watching my every move."

I remember performing a simple experiment back when I was in high school. We decided that one side of the room would look at the teacher, looking engaged, smiling and nodding, while the other side of the room would act bored and disengaged. After a while, the teacher slowly moved over to work the side of the room that was giving her positive vibes. I can't imagine what fun students could have trying to mess with software. Actually, I can imagine some of it-- fake faces, playing Stump the Software, trying to manipulate the speed or direction of the class.

And that's still better than the other scenario I can imagine, which is the one in which the use of this software gives students blanket permission to be inert lumps. No need to be active and ask questions or join in discussion-- just passively let the software decide what the student is thinking, and she doesn't have to actually communicate anything.

The guys from Stanford working on a similar program suggest that it might be useful for students to see the readouts for the whole class, because... why? For students trying to game the system, that would be a great piece of realtime feedback. But otherwise, do students really need one more way tpo check and see if they're "normal"?

I have a hard time spotting the upside to any of this, other than I suppose it could help instructors who lack skills in dealing with carbon based life forms. But mostly it seems intrusive and creepy and enabling of al sorts of poor behavior. And, as is too often the case with ed tech, it appears to be in the hands of people who really haven't thought through even the most basic implications. Some folks gets it:

George Siemens, executive director of LINK Research Lab at the University of Texas at Arlington, applauds the institution for its focus on students' emotions, but he says he doesn't see why technology is necessary to perform a task at which humans are intrinsically more capable.

"I think we’re solving a problem the wrong way," Siemens said. "Student engagement requires greater human involvement, not greater technology involvement."

But other folks-- the folks pioneering this stuff-- don't get a lot of things:

Instructors won’t be able to see individual students’ emotions, either in real time or after the fact -- heading off any immediate complaints that the technology is invasive on a personal level.

Nope. The technology is still invasive on the personal level. Just because the instructor isn't seeing that level of response (other than by, you know, looking and using her human brain) doesn't mean the software isn't collecting it. Still invasive.

Tornoe and his colleagues haven’t yet decided how much students will know about sentiment analysis before they're subjected to it; university administrators have final say over that decision, according to a spokesperson.

Nope. This is not even a question. Failing to tell students that everything down to their facial expressions is being monitored and potentially recorded-- that's just flat out wrong. The fact that Tornoe and his colleagues even think there's a decision to be made shows how far divorced they are from the reality of what they're proposing. This is all kinds of a bad idea.

Sunday, March 25, 2018

ICYMI: Happy Teen Spring Edition (3/25)

The reading list may be a little short this week; I've been busy putting on a show. As always, remember to amplify what you think the world needs to hear.

Mama, Don't Let Your Babies Grow Up To Be Teachers

Yup. Even some people outside the edu-blogosphere have noticed the profession has some problems these days. Andrew Heller lays it out.

Congressional Legislation Seeks To Fund School Vouchers for Military Families Despite Major Opposition from Military Families

Carol Burris takes a look at a sneaky plan under way to use military families as a prop for creating federal vouchers. Make sure you pay attention to this one.

The Facts About NJ Charter Schools Part III: Segregation by English Proficiency

Mark Weber (Jersey Jazzman) continues to piece out the research he conducted with Julia Sass Rubin, looking at NJ charter schools and how they aren't quite what they're cracked up to be.

Back to the Future in Massachusetts

Andrea Gabor takes a look at the commonwealth's problems with inequitable school funding, how that created one lawsuit, and might be about to give rise to another one.

Gender Bias in NYC Admissions

Leonie Haimson lays out how the admission test process for NYC schools shows gender bias.*

Success Academy: Reinventing High School for the Few Who Remain

Mercedes Schneider crunches some numbers and breaks down some implications of Success Academy's huge attrition rate.

Oklahoma: They Could Be The Next To Strike

Dana Goldstein takes a close look at just how bad things are in Oklahoma, and puts it into the context of teachers standing up for the greater good in austerity states.

We Need Civic On Line Reasoning in Our Schools

Nancy Flanagan looks at one of the great missing pieces in 21st century education.

Thank You For Your Service

I was going to try to round up some of the better pieces about the student march, but I'm just going to crib Jose Vison's work instead. Here are several good recommendations.

* I originally incorrectly connected this piece to NYC charters instead of the public system

Friday, March 23, 2018

Opening Night

This weekend finally winds down my spring performance season. A couple of weeks ago we presented a two-school joint production of Shrek; tonight is opening night for my school's annual variety show. We call it the "Broadcast," named (as far as anyone knows) after The Big Broadcast, a 1932 variety film that kicked off a series of Big Broadcast movies through the decade.

I've been directing the show for a bunch of years now. It's a pleasure to work with students in an arena that is so directly organized around performance. We have singers, dancing groups, and any number of odd surprises. One year we included a student who had bought a sitar on line and taught himself to play it. This year we've got some bucket drummers and a trio of T Rex dancers. MCs each year develop and write their own sketches to tie the evening together. My main function is to be a glorified traffic cop and get all of this to flow smoothly (I'm also the stage crew adviser).

My other major function is to make sure that each act is the best it can be. While this involves some broad standards (have energy, know your material, don't sing in keys unrelated to that in which your accompanist is playing, etc), preparing a student variety show is the very definition of a non-standardized activity. Each singer approaches her material with her own personality and style. High levels of precision are more attainable for some dancing groups than for others (and for girls dancing in inflatable T Rex costumes, precision really isn't the point). Each performer has personal goals, brings personal skills and choices to the attainment of those goals, and responds to different styles of coaching and directing. How I work with them is also a function of the relationship that we have (or don't), as well as my own judgment about how far and hard I can push them without making things worse instead of better (demoralizing a performer pretty much never elicits their best work). Sometimes as we work together, we come up with really cool pieces of inspiration; sometimes the performers find their own way to a sweet spot outside the box (I wish I could show you Zoe and Matt doing Coldplay's "Yellow" with guitar and upright bass).

A full book show is, of course, even more challenging. I'm fortunate in that my years on the co-op show have been spent with a woman who is a hugely gifted director, who gets just how to create a picture, what movement to incorporate, how to pull all the pieces together. It's a combination of so many little details in the service of a bigger vision, all put together with the work of students who are, generally, not exactly fully mature performers. But there are moments when things come together, when we find space for bits of inspiration and improvisation, on top of the adjustments that one must make when translating vision into reality.

I've been doing school and community theater for thirty-some years, learning a little more every single time. But there's one thing I'm absolutely certain of.

A show could not be put together by a piece of software.

You could not give each performer some sort of standardized test that would generate data that in turn would determine how people should be cast and how they should go about developing their performance. You could not, in place of rehearsal, have everyone come in and take an on-line performer quiz that would generate a personalized set of software-generated performance improvement instructions. You could not even have the computer "watch" the student perform and then generate a personalized performance report.

The whole idea of a computer "directing" the school show is so transparently foolish it's hard to tell which parts to mock first, but in particular, think about how a student's performance would change if they were performing not for a live human, but for a piece of software. As a performer, it's very hard to get excited about performing for an audience that won't get excited back at you. Software doesn't know the student, doesn't have the background of knowledge to place the student's goals in the context of music in general, doesn't have the understanding to gauge what an audience may or may not respond well to, knows neither the difference between nor the proper timing of holding a hand or kicking a butt. Software doesn't know how to build the trust necessary to convince a performer "It may seem awkward at first, but trust me, this will help the performance."

Nor is it possible to compare all performers on some standardized scale. Was Tyler's dry, wry turn as Gomez Addams "better" than Kate's moving and layered portrayal of Belle? Did Forrest "beat both of them by stopping the show with a single grunt as Lurch? Did the tap dance routine from ten years ago rate "higher" than last year's modern jazz-flavored dance group? And how do I compare either of them to the rock band that covered 99 Luftballoons, or Maddy playing Budapest on ukulele? Exactly how would any piece of software crank out a 4-point-scale rating for any of these?

Bottom line: a computer is too dumb, too ignorant, too not-human to direct your school show.

Now. Does anybody really think that teaching students in a classroom is all that different from directing them on a stage?

I don't. And for all the reasons I believe that software would make a terrible show director, I think software makes a lousy teacher, no matter how "personalized" or algorithm-driven it is, no matter what super-duper data wingdoodles it has.

Spare a thought for us tonight. My students have some traditions to uphold, and they are more than ready to meet the challenge. There are many, many parts to teaching; this is one of the best.

Thursday, March 22, 2018

Teacher Evaluation: Plus or Minus?

Matthew Kraft is an Assistant Professor of Education and Economics at Brown University, and it says something about where we are today that there is even such a job. I look forward to universities hiring someone as a Professor of Microbiology and Sociology, or a Professor of Astrophysics and Cheeseburgers. The notion that economists are automatically qualified to talk about education continues to be one of the minor plagues us these days.

But I digress.

Kraft is in Education Next making some big, sweeping statements about his research:

When I present my research on teacher evaluation reforms, I’m often asked whether, at the end of the day, these reforms were a good or bad thing. This is a fair question—and one that is especially important to grapple with given that state policymakers are currently deciding on whether to refine or reject these systems under ESSA. For all the nuanced research and mixed findings that concern teacher evaluation reforms and how teachers’ unions have shaped these reforms on the ground, what is the end result of the considerable time, money, and effort we have invested?

I think I could skip right ahead to a conclusion here, but Kraft has created a nice little pro and con list that both helps us address the question and gives a quick and dirty picture of what reformsters think they have done in the teacher evaluation world. So let's see what he's got.

The Pro List

So here's what Kraft thinks are the "positive consequences" of ew evaluation systems.

Growing national recognition of the importance of teacher quality.

Is that a consequence of new evaluation systems? Were that many people wandering about before saying, "You know, I don't think it matters at all whether school teachers are awesome or if they suck." Granted, we did have one group dedicated to the notion that we should have a system in which it doesn't matter which teacher you get, that every class should be "teacher-proofed," and that if we do all that well enough, we can park any warm body in a classroom. But those were the reformsters, and I'm not sure they've gotten any wiser on this point.

A slight shift toward the belief that some teachers are better or worse than others.

Again, I'm not sure this is news to ordinary civilians, but lord knows that reformsters have been complaining loudly and constantly that schools are loaded with Terrible Teachers who must be weeded out. How is this a positive, exactly?

The widespread adoption of rigorous observational rubrics for evaluating instructional practice that provide clear standards and a common language for discussing high-quality instruction.

Nope. This is not a positive. Reducing the evaluation of teacher quality to a "rigorous rubric" is not a positive. Academians and economists like it because it lets them pretend that they are evaluating teachers via cold, hard numbers, but you can no more reduce teaching to a "rigorous rubric" than you can come up with a rubric for marital success or parental effectiveness.

For that matter, not only is this not a positive, but there's not much evidence that it has actually happened in any meaningful new way. We've seen lots of teacher eval rubrics (aka checklists) before (get out your Madeline Hunter worksheet, boys and girls) but they never last, because they turn out to be bunk. But at the moment, rubrics and checklists still take a back seat in most districts to Big Standardized Test scores soaked in some kind of VAM sauce. So this item from the pro list is wrong twice.

New administrative data from student information systems that, linked to teacher human resource systems, allow administrators and researchers to answer a range of important questions about teacher effectiveness.

Holy jargonized bovine fecal matter! This is also not a positive. But it's a sign of where this list is headed.

More and better (albeit still imperfect) teacher performance metrics to inform important human capital decisions made by administrators.

See? By the time you're talking about "human capital decisions," you've lost the right to be taken seriously by people who actually work in education. Plus, this is just purple prose dressing up the old reformster idea that we should be using teacher evaluation data to decide who to hire and fire, which is old sauce and not a positive because Kraft has mistyped "albeit still imperfect" when what he surely means is "still completely invalid and unsupported."

Increased attention to the inequitable access to highly effective teachers across racial and socio-economic lines.

First, the "data" on relative awesomeness of teacher at poor schools is almost impossible to take seriously because A) it's based on crappy BS Testing data and B) comparing teachers at wealthy and poor schools is like comparing the speeds of people running down a mountain to the speeds of people running up one.

Second, we don't need a lot of hard data to know that non-wealthy, non-white schools get less support or that state and district funding systems inevitably short change those schools. If you can only believe that because you see numbers on a spread sheet-- I mean, I guess it's swell that you've finally figured it out, but damn, what is wrong with you?

Increased turnover among less effective teachers.

You have no idea whether that is happening or not, because you have no way of knowing which teachers are doing well and which ones are doing poorly. The mere fact that you assume awesomeness or non-awesomeness is a permanent state for every teacher shows that you don't understand the issues involved here.

That's the pro list. And it's pretty much all bunk.The Con List

What downside has there been to the evaluation revolution?

The loss of principal's time to a formal evaluation process and paperwork that (often) have little value.

That is correct. New evaluation systems have created a host of hoops for administrators to jump through, most of which serve no local purpose, but are simply there to satisfy a state bureaucracy's need to see numbers on forms.

The erosion of trust between teachers and administrators. That trust would be useful for real ongoing professional development.

That is correct. Because so many modern reformster evaluation systems were designed with the idea of weeding out all the Terrible Teachers, and because those evaluation systems are often based on random data that the teacher's job performance doesn't actually affect (looking at you, BS Test scores), teachers view the whole process with distrust. One of the most powerful things an administrator can say to a teacher is, "How I can I help you do the kind of job you want to do?" These evaluation systems stand directly in the path of any such interaction.

An increased focus on individual performance at the potential cost of collective efforts.

I'm giving Kraft a bonus point for this one, because too many reformsters refuse to acknowledge that their evaluation systems set up a kind of teacher thunderdome, a system in which I can't collaborate with a colleague because I might just collaborate myself out of a raise or a job. Because a school doesn't make a profit, all teacher merit pay systems must be zero sum, which means in order for you to win, I must lose. This does not build collegiality in a building.

Decreased interest among would-be teachers for entering the profession.

There are certainly many factors at play here, but Kraft is right-- knowing that your job performance will be decided by a capricious, random and fundamentally unfair system certainly doesn't make the profession more attractive.

The costs associated with teacher turnover, particularly in a hard-to-staff schools.

This is correct. Once teachers have been driven out or fired, schools cannot just go grab new teachers of the Awesome Teacher Tree in the back yard. Costs associated with turnover include a lack of stability and continuity at the school, which is not helpful for the students who attend.

I'd add that Kraft has missed a few, but most notably, the waste of time, money, and psychological energy on a system that doesn't provide useful or accurate information, but which presents teachers with, at a minimum, an attack on their own sense of themselves as professionals and, at a maximum, an attack on their actual earning power, or even career. When it comes to teacher evaluation, we are spending a lot of money on junk.

Looking at the balance sheer.

From my perspective, teacher evaluation reforms net a modest positive effect nationally. While my judgment is informed by a growing body of scholarship, it is also subjective, imprecise, and colored by my hope that the negative consequences can be addressed productively going forward.

As I often tell one of my uber-conservative friends, "We see different things here." Since I find none of the positives convincing or compelling, but all of the negatives strike me as accurate, if understated, I see the balance as overwhelmingly negative.

Kraft does ask some good questions in his concluding section. For instance, would schools, teachers and students be better off if states had not "implemented evaluation reforms at all"? It's a useful question because it reminds us that the current sucky system replaced previous sucky systems. But the critical difference is this:

Previous sucky evaluation systems may not have provided useful information about teachers (or depended on being used by good principals to generate good data). But at least those previous systems did not incentivize bad behavior. Modern reform evaluation systems add powerful motivation for schools to center themselves not on teachers or students or even standards, but on test results. And test-centered schools run upside down-- instead of meeting the students' needs, the test-centered school sees the students as adversaries who must be cajoled, coached, trained and even forced to cough up the scores that the school needs. The Madeline Hunter checklist may have been bunk, but at least it didn't encourage me to conduct regular malpractice in my classroom.

So yes-- everyone would be better off if the last round of evaluation "reforms" had never happened.

Kraft also asks if "the rushed and contentious rollout of teacher evaluation reforms poison the well for getting evaluation right."

Hmmm. First, I'll challenge his assumption that rushed rollout is the problem. This is the old "Program X would have been great if it had been implemented properly," but it's almost never the implementation, stupid. There's no good way to implement a bad program. Bad is bad, whether it's rushed or not.

Second, that particular well has never been a source of sparkling pure water, but yes, the current system made things worse. The problems could be reversed. The solution here is the same as the solution to many reform-created education problems-- scrap test-centered schooling. Scrap the BS Test. Scrap the use of a BS Test to evaluate schools or teachers or students. Strip the BS Test of all significant consequences; make it a no-stakes test. That would remove a huge source of poison from the education well.

Wednesday, March 21, 2018

To Facebook Or Not

In light of the most recent revelations about Facebook, folks are once again re-evaluating their relationship with the social media 800-pound gorilla. Should I be on there? Should I promote my social group, my blog, my hobbies on there?

I'm an early adopter. I hopped on when my daughter was a student at Penn State, back in the days when Facebook was only used by certain colleges and universities, and membership was open only to students and family members. It was glorious-- a tool that allowed us to stay in touch, show each other Cool Stuff we had seen. It was far more immediate and authentic writing letters.

Over time it became increasingly complicated and complex, with the gates periodically opened to new groups of users, new utilities added, new ways to waste time on Facebook developed. I watched Facebook aggressively suck all manner of media and activity into its orbit and, like all the other online giants, trying to create an inclusive ecosystem so that users would never have to leave. In many ways, Facebook was a leader in a race to become, as one wag put it, the new AOL.

My interest in the online world was already well-formed by the time I walked into Facebookland. It may have been in part a coincidence of history that the online world was ramping up just as I was figuring out how to get through weekends when my children were with their mother and I was in the house alone. I spent time on the old prodigy bbs system, made friends on ICQ, read the adventures of early online adopters like one celebrity who wrote a terrible letter that would not die or go away.

It seemed fairly obvious in those days that human beings and their ability to create content of any sort, even if it was just filling up a message board or a chat channel (yeah, remember when we called things on line "channels"?), were a desired commodity. It seemed obvious that the online "community" deal was that you traded pieces of yourself for new connective capabilities. It seemed obvious that all of us who used these services were products.

How so many people lost sight of that, or failed to figure it out, is another discussion. But lose sight of it they did. People of my generation impart magical powers and knowledge to digital natives, but the fact is, the vast majority of digital natives are dopes about online life, imagining that they are entitled to secrecy and privacy on line. It is a measure of the seductiveness of online life that the promise to secrecy and privacy has almost never been explicitly made, and yet so many people implicitly believe in it.

The internet is not private. It never has been. That's the first thing you have to understand about going there. The second thing to understand is that everything on line is essentially forever. I've told my students this over and over-- the secret to a happy internet life is to understand that everything you do is public and permanent. I guess the third thing to understand is that people are becoming increasingly creative about how to mine your online self for data.

Now, I'm not saying that if your privacy has been violated by Facebook or any other app it is your own fault. It's reasonable to assume that all of these companies will take steps to protect user privacy and data. But it's practical to assume that one way or another, they will fail. There's nothing wrong with telling a friend, "I'm going to leave this stack of money next to you while I run to the store. Will you keep an eye on it?" But it's a little silly to be shocked and surprised if some of the money is gone when you get back.

Every online activity is really a transaction. This blog's platform is owned by Google, and by running it and drawing in umpteen thousand views, I am making Google money (which is why my son-in-law says I really should be running ads here). But in exchange, I have had an opportunity to spread some words, raise some awareness, and create a tiny piece of noise for a cause I deeply believe in, and make important connections with other people similarly concerned. I'm satisfied with the balance on that transaction.

Likewise, promoting this blog via Facebook has helped me find more audience for my cause. I also use Facebook to maintain connections with old friends, students, and family. My older children live far away, and I have cousins that I've been lucky to see in person once or twice a decade. I get to see my grandchildren grow up. Thanks to Facebook, those connections are all stronger. I know I'm making money for Zuckerberg, but on balance, I'm satisfied with the value I'm getting out of the transaction.

Mind you, I'm thoughtful about what I post, and I keep an eye on my security settings. I don't generally take silly quizzes (which exist mostly to get you to give up access to your data in exchange for finding out which vegetable you most resemble). I'm aware that my digital pocket is being picked every day.

In fact, that sort of visibility is one of the reasons that I will keep maintaining an active facebook page for this blog-- I want the data miners to know that there are people who care about public education and resisting the ed reform movement. I'm not delusional-- I know that this blog has a smaller footprint than, say, people who are concerned about what Justin Bieber is wearing today. But if I'm not here, my cause becomes slightly less visible, marginally easier to ignore. Am I using a tool that is morally compromised? Yes, certainly. I am not aware of a single piece of modern computer technology that isn't. I wish compromise and transaction weren't necessary to function in the modern world, but as near as I can see, they are. So I will continue to weigh the benefits against the cost, try to make my choices mindfully, and for the time being, use Facebook with full awareness that it is also using me.

Tuesday, March 20, 2018

AEI: Voiding the Choice Warrantee

The American Enterprise Institute has a new report that calls into question one of the foundational fallacies of the entire reform movement. Think of it as the latest entry in the Reformster Apostasy movement.

Do Impacts on Test Scores Even Matter? Lessons from Long-Run Outcomes in School Choice Research asks some important questions. We know they are important questions because some of us have been asking and answering them for twenty years.

Here are the key points as AEI lists them:

For the past 20 years, almost every major education reform has rested on a common assumption: Standardized test scores are an accurate and appropriate measure of success and failure.

This study is a meta-analysis on the effect that school choice has on educational attainment and shows that, at least for school choice programs, there is a weak relationship between impacts on test scores and later attainment outcomes.

Policymakers need to be much more humble in what they believe that test scores tell them about the performance of schools of choice: Test scores should not automatically occupy a privileged place over parental demand and satisfaction as short-term measures of school choice success or failure.

Yup. That's just about it. The entire reformster movement is based on the premise that Big Standardized Test results are a reliable proxy for educational achievement. They are not. They never have been, and some of us have been saying so all along. Read Daniel Koretz's book The Testing Charade: Pretending To Make Schools Better for a detailed look at how this has all gone wrong, but the short answer is that when you use narrow unvalidated badly designed tests to measure things they were never meant to measure, you end up with junk.

AEI is not the first reform outfit to question the BS Tests' value. Jay Greene was beating this drum a year and a half ago:

But what if changing test scores does not regularly correspond with changing life outcomes? What if schools can do things to change scores without actually changing lives? What evidence do we actually have to support the assumption that changing test scores is a reliable indicator of changing later life outcomes?

Greene concluded that tests had no real connection to student later-in-life outcomes and were therefor not a useful tool for policy direction. Again, he was saying what teachers and other education professionals had been saying since the invention of dirt, but to no avail.

In fact, if you are of a Certain Age, you may well remember the authentic assessment movement, which declared that the only way to measure any student knowledge and skill was by having the student demonstrate something as close to the actual skill in question. IOW, if you want to see if the student can write an essay, have her write an essay. Authentic assessment frowned on multiple choice testing, because it involves a task that is not anything like any real skill we're trying to teach. But ed reform and the cult of testing swept the authentic assessment movement away.

Really, AEI's third paragraph of findings is weak sauce. "Policymakers should be much more humble" about test scores? No, they should be apologetic and remorseful that they ever foisted this tool on education and demanded it be attached to stern consequences, because in doing so the wrought a great deal of damage on US education. "Test scores should not automatically occupy a privileged place..."? No, test scores should automatically occupy a highly unprivileged place. They should be treated as junk unless and until someone can convincingly argue otherwise.

But I am reading into this report a wholesale rejection of the BS Test as a measure of student, teacher, or school success, and that's not really what AEI is here to do. This paper is focused on school choice programs, and it sets out to void the warrantee on school choice as a policy.

Choice fans, up to and including education secretary Betsy DeVos, have pitched choice in terms of its positive effects on educational achievement. As DeVos claimed, the presence of choice will not even create choice schools that outperform public schools, but the public schools themselves will have their performance elevated. The reality, of course, is that it simply doesn't happen.The research continues to mount that vouchers, choice, charters-- none of them significantly move the needle on school achievement. And "educational achievement" and "school achievement" all really only mean one thing-- test scores.

Choice was going to guarantee higher test scores. They have had years and years to raise test scores. They have failed. If charters and choice were going to usher in an era of test score awesomeness, we'd be there by now. We aren't.

So what's a reformster to do?

Simple. Announce that test scores don't really matter. That's this report.

There are several ways to read this report, depending on your level of cynicism. Take your pick.

Hardly cynical at all. Reformsters have finally realized what education professionals have known all along-- that the BS Tests are a lousy measure of educational achievement. They, like others before them, may be late to enlightenment, but at least they got there, so let's welcome them and their newly-illuminated light epiphanic light bulbs.

Kind of Cynical. Reformsters are realizing that the BS Tests are hurting the efforts to market choice, and so they are trying to shed the test as a measure of choice success because it clearly isn't working and they need reduce the damage to the choice brand being done.

Supremely Cynical. Reformsters always knew that the BS Test was a sham and a fraud, but it was useful for a while, just as Common Core was in its day. But just as Common Core was jettisoned as a strategic argument when it was no longer useful, the BS Test will now be tossed aside like a used-up Handi Wipe. The goal of free market corporate reformsters has always been to crack open the vast funding egg of public education and make it accessible to free marketeers with their education-flavored business models. Reformsters would have said that choice clears up your complexion and gives you a free pony if they thought it would sell the market based business model of schooling, and they'll continue to say-- or stop saying-- anything as long as it helps break up public ed and makes the pieces available for corporate use.

Bottom line. Having failed to raise BS Test scores, some reformsters would now like to promote the entirely correct idea that BS Tests are terrible measures of school success, and so, hey, let's judge choice programs some other way. I would add, hey, let's judge ALL schools some other way, because BS Testing is the single most toxic legacy of modern ed reform.

Monday, March 19, 2018

OH: Computers Are Grading Essays

No sooner had I vigorously mocked the idea of using computers to grade essays, then this came across my desk:

CLEVELAND, Ohio - Computers are grading your child's state tests.

No, not just all those fill-in-the bubble multiple choice questions. The longer answers and essays too.

According to State Superintendent Paolo DeMaria and state testing official Brian Roget (because "state testing official" is now job-- that's where we are now), about 75% of Ohio's BS Tests are being fully graded by computers.

This is a dumb idea.

"The motivation is to be as effective and efficient and accurate in grading all these things," DeMaria told the board. He said that advances in AI are making this new process more consistent and fair - along with saving time and, in the long run, money.

If you think writing can be graded effectively and efficiently and accurately by a computer, then you don't know much about assessing writing. The saving money part is the only honest part of this.

But all the kids are doing it, Mom. American Institutes for Research (AIR-- which is not a research institute at all, but a test manufacturer) is doing it in Ohio, but Pearson and McGraw-Hill and ETS are all doing it, too, so you know it's cool.

DeMaria said that the research is really "compelling," which is another word for "not actually proving anything," and he also claims that even college professors are using Artificial Intelligence to grade papers. He does not share which colleges, exactly, are harboring these titans of educational malpractice. Would be interesting to know. Meanwhile, Les Perelman at MIT has made a whole second career out of repeatedly demonstrating that these essay grading computers are incompetent boobs.

The shift from human scorers is usually a little controversial, which may be why Ohio just didn't tell anyone it was happening. It came to light only after, the article notes wryly, "irregularities" were noticed in grades. Oddly enough, that constitutes a decent blind test of the software-- folks could tell it was doing something wrong even when they didn't know that software was doing the grading.

Some Ohio board members think the shift is just fine, though one picked an unfortunate choice of example:

"As a society, we're on the cusp of self-driving vehicles and we're arguing about whether or not AI can grade a third grade test?" asked recently-appointed board member James Shephard. "I think there just needs to be some perspective here."

I feel certain that as Shephard spoke, he was unaware that a self-driving vehicle just killed a pedestrian in Arizona.

The actual hiccup that called attention to the shift from meat widget grading was a large number of third grade reading tests that came back with a score of zero. That was apparently because they quoted too much of the passage they were responding to, though they are supposed to cite specific evidence from the text. It's the kind of thing that a live human could probably figure out, but since computer software does not actually understand what it is "reading," -- well, zeros. On a test that will determine whether or not the student can advance to fourth grade (because Ohio has that stupid rule, too).

I don't understand a word you just said, but you fail!

The state has offered some direction (30% is the tipping point for how much must be "original") so that now we have the opening shot in what is sure to be a long volley of rules entitled "How to write essays that don't displease the computer." Surely an admirable pedagogical goal for any writing program.

The state reported that of the thousand tests submitted for checking, only one was rescored. This fits with a standard defense of computer grading-- "When we have humans score the essays, the scores are pretty much the same as the computer's." This defense does not move me, because the humans have their hands and brains tied, strapped to the same algorithm that the computer uses. Of course a human gets the same score, if you force that human to approach the essay just as stupidly as the computer does. And computers are stupid-- they will do exactly as they're told, never understanding a single word of their instructions.

The humans-do-it-too defense of computer grading ignores another problem of this system-- audience. Perhaps on the first go round you'll get authentic writing that's an actual measure of something real. But what we already know from stupid human scoring of BS Tests is that teachers and students will adapt their writing to fit the algorithm. Blathering on and on redundantly and repetitiously may be bad writing any other time, but when it comes to tests, filling up the page pleases the algorithm. The algorithm also likes big words, so use those (it does not matter if you use them correctly or not). These may seem like dumb examples, but my own school has had success gaming the system with these rules and rules like them.

And this is worse. I've heard legitimate arguments from teachers who say the computer's ability to sift through superficial details can be on part of a larger, meat-widget based evaluation system, and I can almost buy that, but that's not what Ohio is doing-- they are handing the whole evaluation over to the software.

What do you suppose will happen when students realize that the computer will not care if they illustrate a point by referring to John F. Kennedy's noble actions to save the Kaiser during the Civil War? What do you suppose will happen when students realize that they are literally writing for no human audience at all? How will they write for an algorithm that can only analyze the most superficial aspects of their writing, with no concern or even ability to understand what they are actually saying?

This is like preparing a school band to perform and then having them play for an empty auditorium. It's like having an artist do her best painting and then hanging it in a closet. Even worse, actually-- this is like having those endeavors judged on how shiny they are, still unseen and unheard by human eyes and ears.

Ohio was offered a choice between doing something cheap and doing something right, and they went with cheap. This is not okay. Shame on you, Ohio.

Pages