Thursday, July 9, 2015

PA: Ugly Cut Scores Coming

Brace yourself, Pennsylvania teachers. The cut scores for last years tests have been set, and they are not pretty.

Yesterday the State Board's Council of  Basic Education met to settle their recommendations to the State Board of Education regarding cut scores for the 2014-2015 test results. Because, yes-- cut scores are set after test results are in, not before. You'll see why shortly.

My source at the meeting (don't laugh-- I do actually have sources of information here and there) passed along some of the results, as well as an analysis of the impact of the new scores and the Board's own explanation of how these scores are set. The worst news is further down the stage, but first I have to explain how we get there.

How Are Scores Set?

In PA, we stick with good, old-fashioned Below Basic, Basic, Proficient and Advanced. The cut scores-- the scores that decide where we draw the line between those designations come from two groups.

First, we have the Bookmark Participants. The bookmark participants are educators who take a look at the actual test questions and consider the Performance Level Descriptors, a set of guidelines that basically say "A proficient kid can do these following things." These "have been in place since 1999" which doesn't really tell us whether they've ever been revised or not. According to the state's presentation:

By using their content expertise in instruction, curriculum, and the standards, educators made recommendations about items that distinguished between performance levels (eg Basic/Proficient) using the Performance Level Descriptors. When educators came to an item with which students had difficulty, they would place a bookmark on that question. 

In other words, this group set dividing lines between levels of proficiency in the way that would kind of make sense-- Advanced students can do X, Y, and Z, while Basic students can at least do X. (It's interesting to note that, as with a classroom test, this approach doesn't really get you a cut score until you fiddle with the proportion of items on the test. In other words, if I have a test that's all items about X, every gets an A, but if I have a test that's all Z, only the proficient kids so much as pass. Makes you wonder who decides how much of what to put on the Big Standardized Test and how they decide it.)

Oh, and where do the committee members come from? My friend clarifies:

The cut score panelists were a group that answered an announcement on the Data Recognition website, who were then selected by PDE staff. 

Plus one of the outside "experts" was from the National Center for the Improvement of Educational Assessment, one more group that thanks the Gates Foundation for support.

But Wait-- That's Not All

But if we set cut scores based on difficulty of various items on the BS Test, why can't we set cut scores before the test is even given? Why do we wait until after the tests have been administered and scored?

That's because the Bookmark Group is not the end of the line. Their recommendations go on to the Review Committee, and according to the state's explanation

The Review Committee discussed consequences and potential implications associated with the recommendations, such as student and teacher goals, accountability, educator effectiveness, policy impact and development, and resource allocation.

In other words, the Bookmark folks ask, "What's the difference between a Proficient students and a Basic student." The Review Committee asks, "What will the political and budget fall out be if we set the cut scores here?"

It is the review committee that has the last word:

Through this lens, the Review Committee recommended the most appropriate set of cut scores-- using the Bookmark Participants recommendation-- for the system of grade 3-8 assessments in English Language Arts and Math.

So if you've had the sense that cut scores on the BS Test are not entirely about actual students achievement, you are correct. Well, in Pennsylvania you're correct. Perhaps we're the only state factoring politics and policy concerns into our test results. Perhaps I need to stand up a minute to let the pigs fly out of my butt.

Now, the power point presentation from the meeting said that the Review Committee did not mess with the ELA recommendations of the bookmark folks at all. They admit to a few "minor" adjustments to the math, most having to do with cut scores on the lower end.

You Said Something About Ugly

So, yes. How do the cut scores actually look? The charts from the power point do not copy well at all, and they don't provide a context. But my friend in Harrisburg created his own chart that shows how the proposed cut scores stack up to last year's results. This chart shows the percentage of students who fall into the Basic and Below Basic categories.

- 8.2
- 26.6
- 10.1
- 31.8
- 34.4
- 4.7
- 32.2
- 13.5
- 43.6
- 21.3
- 43.7

-          9.4

-          35.4

This raises all sorts of questions. Did all of Pennsylvania's teachers suddenly decide to suck last year? Is Pennsylvania in the grip of astonishing innumeracy? And most importantly, what the hell happened to the students?

Because, remember, we can read this chart a couple of ways, and one way is to follow the students-- so last year only 22.8% of the fifth graders "failed." But this year those exact same students, just one year older, have a 60.2% failure rate??!! 37% of those students turned into mathematical boneheads in just one year??!! 47% of eighth graders forgot everything they learned as seventh graders??!! 70% failure??!! Really????!!!! My astonishment can barely press down enough punctuation keys.

Said my Harrisburg friend, "There was some pushback from Board members, but all voting members eventually fell in line. It was clear they were ramming this through."

"Farce" doesn't seem too strong a word. 

At a minimum, this will require an explanation of how the math abilities of Pennsylvania students or Pennsylvania teachers could fall off such a stunningly abrupt cliff.

And that's before we even get to the question of the validity of the raw data itself. Of course, none of us are supposed to be able to discuss the BS Test ever, as we've signed an Oath of Secrecy, but we've all peeked and I can tell you that I remember chunks of the 11th grade test in the same way that I remember stumbling across a rotting carcass in the woods or vivid details from my divorce-- unpleasant painful awful things tend to burn themselves into your brain. Point being, this whole exercise starts with tests that aren't very good to begin with.

If I'm teaching a class and suddenly my failure rate doubles or almost triples, I am going to be looking for things that are messed up-- and it won't be the students.

The theme of yesterday's meeting should have been "Holy smokes!! Something is really goobered up with our process because these results couldn't possibly be right" and the theme of the meeting today when these cut scores are recommended to the whole State Board of Education should be the same.

My source thinks it's a done deal and some folks are scrambling to let people like, say, Governor Tom Wolf know that if this happens, there will be a great deal of Spirited Displeasure out in the schools and communities. If you happen to have the phone number of someone in Harrisburg who could be useful, this morning would be a good time to call.

The Chair of the State Board Larry Wittig; the Deputy Secretary of Education Matthew Stem; and the Chair of the Council of Basic Ed, Former State Board Chair and former Erie City Superintendent James Barker apparently are the conductors on this railroad. So when it turns out that your teacher evaluation just dove straight into the toilet because of these shenanigan, be sure to call them.

And here's the list of Board members, though you will literally need to contact them within the next few hours.

I will do my best to keep an eye on things and let you know, but in the mean time, if you're a PA 3-8 teacher, you'd better fasten your seatbelt, because this ride is about to get bumpy.

Update: These numbers did indeed pass. Sorry, colleagues. 


  1. the exact opposite seems to be happening in Oregon and Washington (except for some math). We are not nearly as far along the "reform" movement as most other states. This was our first year for SBAC. Below is a message I sent to PAA members yesterday.

    It seems that both Oregon and Washington are reporting higher SBAC scores than the projected results that other states have had in previous years. (The scores are still abysmal.) Here in Oregon, Stand for Children is gushing all over this, saying how teachers have worked hard to do a better job, and students have risen to the challenge of tougher standards, blah, blah, blah.

    I’m wondering if we can poll our members and find out if other states are seeing this “success” as well. (For PARCC, too.) It may be a bit early. Washington only published preliminary results last week. Can everyone be on the lookout for results in your state and share with us as soon as you find out?

    I have so many questions. About cut scores. About the field test. About the legitimacy of this entire process.

    Here is a link to how the assessment process works. Watch the video. Unbelievable!

    Prediction and scores for Oregon and Washington are at the links below. The prediction score is the same for all states apparently, even though each state did separate field tests. What’s up with that?


  2. Over a decade ago, when I was attending lots of MI State Bd of Ed meetings, I--by chance--happened to attend one where cut scores for the state tests (the MEAP, at that time) were set. Until that day, I had been laboring under the misconception that test scores provided objective data, and that statistical analysis of testing data by highly qualified psychometricians would show where natural breaks between those same four captions occurred. I'm guess that most Americans believe that there is some kind of science behind labeling children and schools (and now, of course, teachers).

    But no. The cut scores that doomed kids and schools (and, lately, teachers) were set almost randomly, by 7 elected officials with no particular assessment expertise, all of whom had "reform" axes to grind. They began with the question: What percentage of kids do we want to fail--to be labeled "below basic?"

    There was a mix of Rs & Ds on the Board, progressives who still believed in public education as well as a couple of "public schools have failed" types. The setting of cut scores involved lots of horse-trading and political infighting. But nothing resembling statistical analysis.

    It was a powerful lesson, one I've shared many times. But still--Joe Average believes that testing data is real, cut scores are real and should drive policy.

  3. It has always been so and will always be so. Because cut-score determination -- and ipso facto high stakes testing -- is a farce.

  4. James Barker is not the current Superintendent of Erie's Public Schools. The Superintendent is Dr. Jay Badams.

  5. Noooo.... we were formulating plans to flee Maryland, new home of Andy Smarick and Checker Finn on our state BoE (thanks fer nothin', Larry Hogan. :-(), in favor of PA where we both grew up.

    Damn, homeschooling is looking better and better.

  6. Could this be the reason?

    Prior to Common-Core Tests, Some States Raised the Bar for 'Proficiency' | EdWeek | Curriculum Matters

    It's happening in other states.

    More than 7 in 10 high school juniors in Washington State FAIL the Unfair SBAC Math Test | Wait What?

  7. This comment has been removed by the author.

  8. So much for waivers. NCLB's insane mentality of annual rising score demands are alive and thriving behind closed doors