Wednesday, February 11, 2015

Testing the Invisibles

Last weekend, Chad Aldeman of Bellwether Education Partners took to the op-ed pages of the NYT to make his case for annual standardized testing. I offered my response to that here (short version: I found it mostly unconvincing).

But Aldeman is back today on Bellwether's blog to elaborate on one of his supporting points, and I think it's worth responding to because it's one of the more complicated fails in the pro-testing argument.

Aldeman's point is this: NCLB's requirement that districts be accountable for subgroups forced schools to pay attention to previously-ignored portions of their student population, and that led to extra attention that paid off in test score gains for members of those groups. Aldeman did some data crunching, and he believes that they crunched results show "a move away from annual testing would leave many subgroups and more than 1 million students functionally “invisible” to state accountability systems."

This whole portion of the testing argument shows a perfect pairing of a real problem and a false solution. I just wrote about how this technique works, but let me lay out what the issue is here.

I believe that Aldeman's statement of the basic issue is valid. I believe that we are right to question just how much certain school districts hope to hide their problem students, their difficult students, their we-just-aren't-sure-what-to-do-with-them students. I believe it's right to make sure that a school is serving all students, regardless of race, ability, class, or any other differential identifier you care to name.

But where Aldeman and I part ways comes next.


Are tests our only eyes?

Aldeman adds a bunch of specific data about how many groups of students at various districts would become invisible if annual testing stopped, which just makes me ask-- is a BST the only possible way to see those students? There's no other possible measure, like, say, the actual grades and class performance in the school, that the groups could be broken out of? (And-- it should be noted that Aldeman skips right over the part where we ask if any such ignoring and invisibility was actually taking place.)

Because I'm thinking that not only are Big Standardized Tests not the only possible way to hold schools accountable for how they educate the subgroups, but they aren't even the best way. Or a good way.

Disagregated bad data is still bad data.

Making sure that we break out test results for certain subgroups is only useful if the test results tell us something useful. There's no reason to believe that the PARCC, the SBA, and the various other Big Standardized Tests tell us anything significant about the quality of a student's education.

Aldeman writes that losing the annual BST would be bad "because NCLB’s emphasis on historically disadvantaged groups forced schools to pay attention to these groups and led to real achievement gains." But by "real achievement gains" Aldeman just means better test scores, and after over a decade of test-based accountability, we still have no real evidence that test scores have anything to do with real educational achievement.

This part of the argument continues to be tautological-- we need to get these students' test scores because otherwise, how will we know what their test scores are. The testy worm continues to devour its own tail, but still nobody can offer evidence that the BST measures any of the things we are rightfully concerned about.

Still, even as bad data, it forces school districts to pay attention these "historically disadvantaged groups." That's got to be a good thing, right?

Well, no.

The other point that goes unexamined by Aldeman and other advocates of this argument is just what being visible gets these students.

Once we have disagregated a group and rendered them visible, what exactly comes next?

Does the local district say, "Wow- we must take steps to redirect resources and staff to make sure the school provides a richer, fuller, better education to these students." Does the state say, "This district needs an increase in state education aid money in order to meet the needs of these students."
Generally, no.

Instead, the students with low test scores win a free trip to the bowels of test-prep hell. Since NCLB began, we've heard a steady drip-drip-drip of stories about students who, having failed the BST (or the BST pre-test that schools started giving for precisely the purpose of spotting probable test-failers before they killed the school's numbers) lose access to art and music and gym or even science and history. These students get tagged for days filled with practice tests, test prep, test practice, test sundaes with test cherries on top. In order to insure that their test scores go up, their access to a full, rounded education goes down. This is particularly damaging when we're talking about students who have great strengths in areas that have nothing to do with taking a standardized reading and math test.

Disagregation also makes it easier to inflict Death By Subgroup on a school. Too many low BST subgroup failures, and a school can become a target for turnaround or privatization.

Visibility needs a purpose

Nobody should be invisible-- not in school, not in life. But it's not enough just to be seen. It matters what people do once they see you.


So far we have mostly failed to translate visibility into a better education for members of the subgroups. In fact, at many schools we have actually given them less education, an education in nothing but test taking. And by making them the instruments of a school's punishment, we encourage schools to view these students as problems and obstacles rather than human beings to assist and serve.

NCLB turned schools backwards, turning children from students to be served by the school into employees whose job is to earn good test scores for the school. As with many portions of NCLB, the original goal may well have been noble, but the execution turned that goal into a toxic backwards version of itself.

Making sure that "historically disadvantaged subgroups" don't become overlooked and under-served (or, for that matter, ejected by a charter school for being low achievers) is a laudable and essential goal, but using Big Standardized Tests, annually or otherwise, fails as an instrument of achieving that goal.

ESEA Hearing: What Wasn't Answered

The first Senate hearing on the NCLB rewrite focused on testing and accountability. Discussion at and around the hearing has centered on questions of the Big Standardized Test. How many tests should be given? How often should the test be given? Should it be a federal test or a state test? Who should decide where to draw the pass-fail line on the test?

These are all swell questions to ask, but they are absolutely pointless until we answer a more fundamental question:

What do the tests actually tell us?

mansion.jpg
Folks keep saying things such as "We need to continue testing because we must have accountability." But that statement assumes that tests actually provide accountability. And that is a gargantuan assumption, leading Congress to contemplate building a five-story grand gothic mansion of accountability on top of a foundation of testing sand in a high stakes swamp.

The question did not go completely unaddressed. Dr. Martin West led off with some observations about the validity of the test. And then he trotted out Chetty, Friedman and Rockoff (2014) a study that piles tautology (we define good teachers as those with good test results, and then we discover that those good teachers get good test results; also, red paint is red) on top of correlation dressed up as causation. If you like your Chetty debunking with a more scholarly flair, try this. If you like it with Phineas and Ferb references, try this.

Then West piled up more correlation dressed as causation. Citing Deming et al (2014), West takes a stand for the predictive power of testing, and in doing so, he himself makes clear why his support of testing validity is actually no support at all.

Predictive power is not causation. Let's take a stroll through a business district and meet some random folks. I'll bet you that the quality of their shoes is predictive of the quality of their cars and their homes. Expensive shoes predict a Lexus parked in front of a five story grand gothic mansion.
It does not follow, however, that if I buy really nice shoes for all the homeless people in that part of town, they will suddenly have expensive homes and fancy cars.

And here's how test-based accountability works. People off in some capital tell local authorities, "We want to end homelessness. So we expect pictures of all your homeless wearing nice shoes. And if the number doesn't go up, we will dock your pay, kill your dog, and take away your dessert for a year." The local authorities will get those pictures (even if they have to use fake shoes or the same shoes on multiple feet), send off the snapshots to the capital, the capital folks will congratulate themselves for ending poverty, and the homeless people will still be sleeping under a bridge and not in a fancy gothic mansion.

Another version of the same central question that was neither asked nor answered at the hearing would be:

What would give us the best, most complete, most accurate sense of how well educated a young person might be? How many people would seriously answer, "Oh, given the need to measure the full range of a person's skills, knowledge and aptitudes, I would absolutely depend on a bubble test covering just two thin slivers out of the whole pizza of that person." When you think of a well-educated person, do you automatically think of a person who does really well on standardized tests of certain math and reading skills?

Oddly enough, it was a nominally pro-test witness whose testimony underlined that. Paul Leather, of the New Hampshire Department of Education, testified at some length about the granite state's extensive work in developing something more like a whole-child, full-range assessment-- something that is robust and flexible and individual and authentic and basically everything that a standardized mass-produced test is not.

Congress put the cart not only before the horse, but before the wheels came back from the blacksmith shop. What they need to do is bring in the testing whizzes of Pearson/PARCC/SBA/etc and ask them to show how the Big Standardized Test measures anything other than a student's ability to take the Big Standardized Test. And I have not even addressed the question of whether or not the Big Standardized Test accurately measures even the slim slice of skills that it claims to assess-- but that question needs to be asked as well. We're missing serious discussions of testing's actual results, like this one. Instead, Congress engaged in a long discussion of how best to clean and press the emperor's new clothes.

There is no point in discussing what testing program best provides accountability if the tests do not actually measure any of the things we want schools to be accountable for. You can build your big gothic mansion in the swamp, but it will be sad, scary and dangerous for any people who have to live there.

Originally posted at View from the Cheap Seats

Tuesday, February 10, 2015

Sorting the Tests

Since the beginnings of the current wave of test-driven accountability, reformsters have been excited about stack ranking-- the process of sorting out items from the very best to the very worst (and then taking a chainsaw to the very worst).

This has been one of the major supporting points for continued large-scale standardized testing-- if we didn't have test results, how would we compare students to other students, teachers to other teachers, schools to other schools. The devotion to sorting has been foundational, rarely explained but generally presented as an article of faith, a self-evident value-- well, of course, we want to compare and sort schools and teachers and students!

But you know what we still aren't sorting?

The Big Standardized Tests.

Since last summer the rhetoric to pre-empt the assault on testing has focused on "unnecessary" or "redundant" or even "bad" tests, but we have done nothing to find these tests.

Where is our stack ranking for the tests?

We have two major BSTs-- the PARCC and the SBA. In order to better know how my child is doing (isn't that one of our repeated reasons for testing), I'd like to know which one of these is a better test. There are other state-level BSTs that we're flinging at our students willy-nilly. Which one of these is the best? Which one is the worst?

I mean, we've worked tirelessly to sort and rank teachers in our efforts to root out the bed ones, because apparently "everybody" knows some teachers are bad. Well, apparently everybody knows some tests are bad, so why aren't we tracking them down, sorting them out, and publishing their low test ratings in the local paper?

We've argued relentlessly that I need to be able to compare my student's reading ability with the reading ability of Chris McNoname in Iowa, so why can't I compare the tests that each one is taking?

I realize that coming up with a metric would be really hard, but so what? We use VAM to sort out teachers and it has been debunked by everyone except people who work for the USED. I think we've established that the sorting instrument doesn't have to be good or even valid-- it just has to generate some sort of rating.

So let's get on this. Let's come up with a stack-ranking method for sorting out the SBA and the PARCC and the Keystones and the Indiana Test of Essential Student Swellness and whatever else is out there. If we're going to rate every student and teacher and school, why would we not also rate the raters? And then once we've got the tests rated, we can throw out the bottom ten percent of them. We can offer a "merit bonus" to the company that made the best one (and peanuts to everyone else) because that will reward their excellence and encourage them to do a good job! And for the bottom twenty-five percent of the bad tests, we can call in turnaround experts to take over the company.

In fact-- why not test choice? If my student wants to take the PARCC instead of the ITESS because the PARCC is rated higher, why shouldn't my student be able to do that. And if I don't like any of them, why shouldn't I be able to create a charter test of my own in order to look out for my child's best interests? We can give every student a little testing voucher and let the money follow them t whatever test they would prefer to take from whatever vendors pop up.

Let's get on this quickly, because I think I've just figured out to make a few million dollars, and it's going to take at least a weekend to whip up my charter test company product. Let the sorting and comparing and ranking begin!

Monday, February 9, 2015

6 Testing Talking Points

Anthony Cody scored a great little handout last week that is a literal guide to how reformster want to talk about testing. The handout-- "How To Talk About Testing"-- covers six specific testing arguments and how reformsters should respond to them, broken down into finding common ground, pivoting to a higher emotional place, do's, don'ts, rabbit holes to avoid, and handy approaches for both parents and business folks. Many of these talking points will seem familiar.

But hey-- just because something is a talking point doesn't mean that it's untrue. Let's take a look:

Argument: There's too much testing

Advice: You can't win this one because people mostly think it's true (similar to the way that most people think the earth revolves around the sun). But you can pivot back with the idea that newer, better Common Core tests will fix that, somehow, and also "parents want to know how their kids are doing and they need a [sic] objective measuring stick."

We've been waiting for these newer, better tests for at least a decade. They haven't arrived and they never will. And aren't parents yet tired of the assertion that they are too dopey to know how their children are doing unless a standardized test tells them? How can this still be a viable talking point? Also, objective measuring sticks are great-- unless you're trying to weigh something or measure the density of a liquid or check photon direction in a quantum physics experiment. Tests may well be measuring sticks-- but that doesn't mean they're the tool for the job.

Do tell parents that the new tests will make things better, but don't overpromise (because the new tests won't make a damn bit of difference). Do tell parents to talk to the teacher, but don't encourage them to get all activisty because that would cramp our style because that will probably scare them, poor dears.

And tell business guys that we're getting lots of accountability bang for our buck. Because who cares if it's really doing the job as long as it's cheap?

Argument: We can't treat schools like businesses

Advice: People don't want to think of schools as cutthroat, but tell them we need to know if the school is getting results. "Parents have a right to know if their kids are getting the best education they can." Then, I guess, cross your fingers and hope that parents don't ask, "So what does this big standardized test have to do with knowing if my child is getting a great education?"

People want results and like accountability (in theory). "Do normalize the practice of measuring performance." Just don't let anybody ask how exactly a standardized test measures the performance of a whole school. But do emphasize how super-important math and reading are, just in case anyone wants to ask how the Big Standardized Test can possibly measure the performance of every other part of the school.

At the same time, try not to make this about the teachers and how their evaluation system is completely out of whack thanks to the completely-debunked idea of VAM (this guide does not mention value-added). Yes, it measures teacher performance, but gosh, we count classroom observation, too. "First and foremost the tests were created to help parents and teachers know if a student is reading and doing math at the level they should."

Yikes-- so many questions should come up in response to this. Like, we've now been told multiple reasons for the test to be given-- is it possible to design a single test that works for all those purposes? Or, who decides what level the students "should" be achieving?

The writer wants you to know that the facts are on your side, because there's a 2012 study that shows a link between 7 year old reading and math ability and social class thirty-five years later. From the University of Edinburgh. One more useful talking point to use on people who don't understand the difference between correlation and causation.

Argument: It's just more teaching to the test

Advice: A hailstorm of non-sequitors. You should agree with them that teaching to the test is a waste of time, but the new tests are an improvement and finally provide parents with valuable information.

Okay, so not just non-sequitors, but also Things That Aren't True. The writer wants you to argue essentially that new generation tests are close to authentic assessment (though we don't use those words), which is baloney. We also recycle the old line that these tests don't just require students to fill in the blanks with facts they memorized last week. Which is great, I guess, in the same way that tests no longer require students to dip their pens in inkwells.

As always, the test prep counter-argument depends on misrepresenting what test prep means. Standardized tests will always require test prep, because any assessment at all is a measure of tasks that are just like the assessment. Writing an essay is an assessment of how well a student can write an essay. Shooting foul shots is a good assessment of how well a player can shoot foul shots. Answering standardized test questions is an assessment of how well a student answers standardized test questions, and so the best preparation for the test will always be learning to answer similar sorts of test questions under similar test-like conditions, aka test prep.

The business-specific talking point is actually dead-on correct-- "What gets measured gets done!" And what gets measured with a standardized test is the ability to take a standardized test, and therefor teachers and schools are highly motivated to teach students how to take a standardized tests. (One might also ask what implications WGMGD has for all the subjects that aren't math and reading.)

The suggestion for teacher-specific message is hilarious-- "The new tests free teachers to do what they love: create a classroom environment that's about real learning, teaching kids how to get to the answer, not just memorize it." And then after school the children can pedal home on their pennyfarthings and stop for strawberry phosphates.

Argument: One size doesn't fit all

This is really the first time the sheet resorts to a straw man, saying of test opponents that "they want parents to feel that their kids are too unique for testing." Nope (nor can one be "too unique" or "more unique" or "somewhat pregnant"). I don't avoid one-size-fits-all hats because I think I'm too special; I just know that they won't fit.

But the advice here is that parents need to know how their kids are doing at reading and math because all success in life depends on reading and math. And they double down on this as well:

There are many different kinds of dreams and aspirations, with one way to get there: reading and math... There isn't much you can do without reading and math... Without solid reading and math skills, you're stuck

And, man-- I am a professional English teacher. It is what I have devoted my life to. But I'll be damned if I would stand in front of any of my classes, no matter how low in ability, and say to them, "You guys read badly, and you are all going to be total failures in life because you are getting a lousy grade in my class." I mean-- I believe with all my heart that reading and writing are hugely important skills, but even I would not suggest that nobody can amount to anything in life without them.

Then there's this:

It's not about standardization. Quite the opposite. It's about providing teachers with another tool, getting them the information they need so they can adapt their teaching and get your kids what they need to reach their full potential.

So here's yet another alleged purpose for the test, on top of the many others listed so far. This is one magical test, but as a parent, I would ask just one question-- When will the test be given, and when will my child's teacher get back the results that will inform these adaptations? As a teacher, I might ask how I'll get test results that will both tell me what I have yet to do this year AND how well I did this year. From the same test! Magical, I'm telling you!


Argument: A drop in scores is proof

I didn't think the drop in test scores was being used as proof of anything by defenders of public ed. We know why there was a drop-- because cut scores were set to insure it.

Advice: present lower test scores as proof of the awesomeness of these new, improved tests. But hey-- look at this:

We expected the drop in scores. Any time you change a test scores drop. We know that. Anything that's new has a learning curve.

But wait. I thought these new improved tests didn't require any sort of test prep, that they were such authentic measures of what students learn in class that students would just transfer that learning seamlessly to the new tests. Didn't you say that? Because it sounds now like students need a few years to get the right kind of test preparation do well on these.

Interesting don'ts on this one--don't trot out the need to have internationally competitive standards to save the US economy with college and career ready grads.

Argument: Testing is bad. Period.

Advice: Yes, tests aren't fun. They're not supposed to be. But tests are a part of life. "They let us know we're ready to move on." So, add one more item to the Big List of Things The Test Can Do.

Number one thing to do? Normalize testing. Tests are like annual checkups with measures for height and weight, which I guess is true if all the short kids are flunked and told they are going to fail at life and then the doctors with the most short kids get paid less by the insurance company and given lower ratings. In that case then, yes, testing is just like a checkup.

The writer wants you to sell the value of information, not the gritty character-building experience of testing. It's a good stance because it assumes the sale-- it assumes that the Big Standardized Test is actually collecting real information that means what it says it means, which is a huge assumption with little evidence to back it up.

Look, testing is not universal. Remember when you had to pass your pre-marital spousing test before you could get married, or the pre-parenting test before you could have kids? No, of course not. Nor do CEO's get the job by taking a standardized test that all CEO's must take before they can be hired.

Where testing does occur, it occurs because it has proven to have value and utility. Medical tests are selected because they are deemed appropriate for the specific situation by medical experts, who also have reason to believe that the tests deliver useful information.

Of all the six points, this one is the most genius because it complete skips past the real issue. There are arguments to be made against all testing (Alfie Kohn makes the best ones), but in a world where tests are unlikely to be eradicated, the most important question is, "Is this test any good?" All tests are not created equal. Some are pretty okay. Some are absolute crap. Distinguishing between them is critical.

So there are our six testing talking points. You can peruse the original to find more details-- they're very peppy and have snappy layouts and fonts. They are baloney, but it's baloney in a pretty wrapper in small, easy-to-eat servings. But still baloney.

Sunday, February 8, 2015

Reformster Fallacious Argument Made Simple

It is one of the great fallacies you will frequently encounter in the work of education reform.

I most recently encountered a very striking version of it in a new position-paper-advocacy-research-report-white-paper-thingy from FEE, the reformster group previously working for Jeb Bush and handed over (at least until Bush finishes trying to be President) to Condoleezza Rice.

The report (which I went over in more detail here) wants to make the case for charters and choice in education, and it starts by arguing that soon there will be way too few employed people paying for way too many children and retired geezers, therefore, school choice. The "report" runs to almost 100 pages, and ninety-some of those are devoted to mapping out the severe scrariosity of the upcoming crisis. The part that explains how school choice would fix this-- that gets a couple of pages. At its most critical juncture, the argument depends on one previously debunked study.

This is a relatively common fallacious argument structure, but if you are going to spend time in the education debates, it's useful to know it when you see it. The basic outline of the argument looks like this:

1) SOMETHING AWFUL IS GOING TO HAPPEN OH MY GOOD LORD IN HEAVEN LOOOK I EVEN HAVE CHARTS AND GRAPHS AND IT IS SOOOOOOOOO TERRIBLE THAT IT WILL MAKE AWFUL THINGS HAPPEN, REALLY TERRIBLE AWFUL THINGS LET ME TELL YOU JUST HOW AWFUL OH GOD HEAVENS WE MUST ALL BEWARE--- BEEE WAAAARREEEEEEE!!!!!!!!!

2) therefore for some reason

3) You must let me do X to save us!

The trick here is to load up #1 with facts and figures and details and specifics. Make it as facty and credible as you possibly can (even if you need to gin up some fake facts to do it).

#3 is where you load in your PR for whatever initiative you're pushing.

And #2 you just try to skate past as quickly as possible, because #2 is the part that most needs support and proof and fact-like content, but #2 is also the place where you probably don't have any.

In a normal, non-baloney argument, #2 is the strongest point, because the rational, supportable connection between the problem and the solution is what matters most. But if you are selling baloney, that connection is precisely what you don't have. So instead of actual substance in #2, you just do your best to drive up the urgency in #1.

For example:

1) The volcano is gigantic and scary and when lava comes pouring out of it WE ARE ALL GOING TO DIE HOT FLAMING DEATHS AND SUFFOCATE IN ASH AND IT WILL BE TERRIBLE

2) Therefore, for some reason

3) We should sacrifice some virgins

Or:

1) We are falling behind other countries and if we don't get caught back up we will be BEHIND ESTONIA!! ESTONIA!!!! GOOD GOD, WE MUST NOT FALL BEHIND THESE OTHER NATIONS ON THE TOTALLY MADE-UP INTERNATIONAL AWESOMENESS INDEX

2) Therefore, for some reason

3) We should adopt Common Core

You can manufacture the #1 crisis if necessary. But this can be even more effective if you use an actual real problem for #1:

1) Poor and minority children in this country keep getting the short end of the stick educationally, with fewer resources and less opportunity to break out of the cycle of poverty. This is a crappy way for our fellow Americans to have to live, and certainly leaving no pathway out of poverty is a violation of the American dream

2) Therefore, for some reason

3) We should make sure they all have to take a Big Standardized Test every year.

You just have to convey a sense of urgency about #1 and never ever let the conversation drift to #2. If people start trying to ask exactly how #3 actually helps with #1, you just rhetorical question them into silence.

Treat questioning #2 as if it's the same as questioning #1.Can't for the life of you see how the #1 of poverty and under-resourced schools is solved by more charter schools that drain resources from public education and only agree to teach the handful of students that they accept, while remaining unaccountable to anyone? Condoleezza Rice says you're a racist.

But it's #2 where the most important questions lie. Even if I accept that US schools are in some sort of crisis (which I don't, but if), exactly how would Common Core fix that? I do believe that we have a real problem with poverty in this country, but how, exactly, will giving poor kids standardized tests help with that?

If you have a gut feeling that a great deal of the reformster just doesn't make sense, #2 is where the problem mostly lies. Most reformster arguments involve using a loud #1 and a slick #3 to cover up a non-existent #2.

1) Some students score low on Big Standardized Tests-- They GET LOW SCORES! LOW SCORES THAT ARE A BAAAAAAD THING! True, they're a bad thing because we've set up a system of artificial imposed punishments for low scores but hey, still-- LOOOOWWWW SCOOORESSSSSS!!!!!

2) Therefore, for some reason

3) There should be no tenure for teachers

There's no connection at all. We could just as easily say

3) Taxpayers should buy charter operators a pony

3) The National Guard should shoot a badger

3) We should sacrifice a virgin

But of course badgers and ponies and virgins aren't nearly as profitable as charters and tests. That, and I think some folks really believe that #2 is there when it just isn't. Either way, it's important to know what the real connection is before you start sacrificing virgins.

Sampling the PARCC

Today, I'm trying something new. I've gotten myself onto the PARCC sample item site and am going to look at the ELA sample items for high school. This set was updated in March of 2014, so, you know, it's entirely possible they are not fully representative, given that the folks at Pearson are reportedly working tirelessly to improve testing so that new generations of Even Very Betterer Tests can be released into the wild, like so many majestic lion-maned dolphins.

So I'm just going to live blog this in real-ish time, because we know that one important part of measuring reading skill is that it should not involve any time for reflection and thoughtful revisiting of the work being read. No, the Real Readers of this world are all Wham Bam Thank You Madam Librarian, so that's how we'll do this. There appear to be twenty-three sample items, and I have two hours to do this, so this could take a while. You've been warned.

PAGE ONE: DNA

Right off the bat I can see that taking the test on computer will be a massive pain in the ass. Do you remember frames, the website formatting that was universally loathed and rapidly abandoned? This reminds me of that. The reading selection is in its own little window and I have to scroll the reading within that window. The two questions run further down the page, so when I'm looking at the second question, the window with the selection in it is halfway off the screen, so to look back to the reading I have to scroll up in the main window and then scroll up and down in the selection window and then take a minute to punch myself in the brain in frustration.

The selection is about using DNA testing for crops, so fascinating stuff. Part A (what a normal person might call "question 1") asks us to select three out of seven terms used in the selection, picking those that "help clarify" the meaning of the term "DNA fingerprint," so here we are already ignoring the reader's role in reading. If I already understand the term, none of them help (what helped you learn how to write your name today?), and if I don't understand the term, apparently there is only one path to understanding. If I decide that I have to factor in the context in which the phrase is used, I'm back to scrolling in the little window and I rapidly want to punch the test designers in the face. I count at least four possible answers here, but only three are allowed. Three of them are the only answers to use "genetics" in the answer; I will answer this question based on guesswork and trying to second guess the writer.

Part B is a nonsense question, asking me to come up with an answer based on my first answer.

PAGE TWO: STILL FRICKIN' DNA

Still the same selection. Not getting any better at this scrolling-- whether my mouse roller scrolls the whole page or the selection window depends on where my cursor is sitting.

Part A is, well... hmm. If I asked you, "Explain how a bicycle is like a fish," I would expect an answer from you that mentioned both the bicycle and a fish. But PARCC asks how "solving crop crimes is like solving high-profile murder cases." But all four answers mention only the "crop crime" side of the comparison, and the selection itself says nothing about how high-profile murder cases are solved. So are students supposed to already know how high-profile murder cases are solved? Should they assume that things they've seen on CSI or Law and Order are accurate? To answer this we'll be reduced to figuring out which answer is an accurate summary of the crop crime techniques mentioned in the selection.

This is one of those types of questions that we have to test prep our students for-- how to "reduce" a seemingly complex question to the simpler question. This question pretends to be complex; it is actually asking, "Which one of these four items is actually mentioned in the selection?" It boils down to picky gotcha baloney-- one answer is going to be wrong because it says that crop detectives use computers "at crime scenes"

Part B.The old "which detail best supports" question. If you blew Part A, these answers will be bizarrely random.

PAGE THREE: DNA

Still on this same damn selection. I now hate crops and their DNA.

Part A wants to know what the word "search" means in the heading for the final graph. I believe it means that the article was poorly edited, but that selection is not available. The distractor in this set is absolutely true; it requires test-taking skills to eliminate it, not reading skills.

Part B "based on information from the text" is our cue (if we've been properly test prepped) to go look for the answer in the text, which would take a lot less time if not for this furshlugginer set up. The test writers have called for two correct answers, allowing them to pretend that a simple search-and-match question is actually complex.

PAGE FOUR: DNA GRAND FINALE, I HOPE

Ah, yes. A test question that assesses literally nothing useful whatsoever. At the top of the page is our selection in a full-screen width window instead of the narrow cramped one. At the bottom of the page is a list of statements, two of which are actual advantages of understanding crop DNA. Above them are click-and-drag details from the article. You are going to find the two advantages, then drag the supporting detail for each into the box next to it. Once you've done all this, you will have completed a task that does not mirror any real task done by real human beings anywhere in the world ever.

This is so stupid I am not even going to pretend to look for the "correct" answer. But I will remember this page clearly the next time somebody tries to unload the absolute baloney talking point that the PARCC does not require test prep. No students have ever seen questions like this unless a teacher showed them such a thing, and no teacher ever used such a thing in class unless she was trying to get her students ready for a cockamamie standardized test.

Oh, and when you drag the "answers," they often don't fit in the box and just spill past the edges, looking like you've made a mistake.

PAGE FIVE: FOR THE LOVE OF GOD, DNA

Here are the steps listed in the article. Drop and drag them into the same order as in the article. Again, the only thing that makes this remotely difficult is wrestling with the damn windows. This is a matching exercise, proving pretty much nothing.

PAGE SIX: APPARENTLY THIS IS A DNA TEST TEST

By now my lower-level students have stopped paying any attention to the selection and are just trying to get past it to whatever blessed page of the test will show them something else.

Part A asks us to figure out which question is answered by the selection. This is one of the better questions I've seen so far. Part B asks which quote "best" supports the answer for A. I hate these "best" questions, because they reinforce the notion that there is only one immutable approach for any given piece of text. It's the very Colemanian idea that every text represents only a single destination and there is only one road by which to get there. That's simply wrong, and reinforcing it through testing is also wrong. Not only wrong, but a cramped, tiny, sad version of the richness of human understanding and experience.

PAGE SEVEN: SOMETHING NEW

Here comes the literature. First we get 110 lines of Ovid re: Daedelus and Icarus (in a little scrolling window). Part A asks which one of four readings is the correct one for lines 9 and 10 (because reading, interpreting and experiencing the richness of literature is all about selecting the one correct reading). None of the answers are great, particularly if you look at the lines in context, but only one really makes sense. But then Part B asks which other lines support your Part A answer and the answer here is "None of them," though there is one answer for B that would support one of the wrong answers for A, so now I'm wondering if the writers and I are on a different page here.

PAGE EIGHT: STILL OVID

Two more questions focusing on a particular quote, asking for an interpretation and a quote to back it up. You know, when I say it like that, it seems like a perfectly legitimate reading assessment. But when you turn that assessment task into a multiple choice question, you break the whole business. "Find a nice person, get married and settle down," seems like decent-ish life advice, but if you turn it into "Select one of these four people, get married in one of these four ceremonies, and buy one of these four houses" suddenly it's something else.

And we haven't twisted this reading task for the benefit of anybody except the people who sell, administer, score and play with data from these tests.

PAGE NINE: OVID

The test is still telling me that I'm going to read two selections but only showing me one. If I were not already fully prepped for this type of test and test question, I might wonder if something were wrong with my screen. So, more test prep required.

Part A asks what certain lines "most" suggest about Daedelus, as if that is an absolute objective thing. Then you get to choose what exact quotes (two, because that makes it more complex) back you up. This is not constructing and interpretation of a piece of literature. Every one of these questions makes me angrier as a teacher of literature and reading.

PAGE TEN: ON TO SEXTON

Here's our second poem-- "To a Friend Whose Work Has Come To Triumph." The two questions are completely bogus-- Sexton has chosen the word "tunneling" which is a great choice in both its complexity and duality of meaning, a great image for the moment she's describing. But of course in test land the word choice only "reveals" one thing, and only one other piece of the poem keys that single meaning. I would call this poetry being explained by a mechanic, but that's disrespectful to mechanics.

PAGE ELEVEN: MORE BUTCHERY

Determine the central idea of Sexton's poem, as well as specific details that develop the idea over the course of the poem. From the list of Possible Central Ideas, drag the best Central Idea into the Central Idea box.

Good God! This at least avoids making explicit what is implied here-- "Determine the central idea, then look for it on our list. If it's not there, you're wrong." Three of the four choices are okay-ish, two are arguable, and none would impress me if they came in as part of a student paper.

I'm also supposed to drag-and-drop three quotes that help develop the One Right Idea. So, more test prep required.

PAGE TWELVE: CONTRAST

Now my text window has tabs to toggle back and forth between the two works. I'm supposed to come up with a "key" difference between the two works (from their list of four, of course) and two quotes to back up my answer. Your answer will depend on what you think "key" means to the test writers. Hope your teacher did good test prep with you.

PAGE THIRTEEN: ESSAY TIME

In this tiny text box that will let you view about six lines of your essay at a time, write an essay "that provides and analysis of how Sexton transforms Daedelus and Icarus." Use evidence from both texts. No kidding-- this text box is tiny. And no, you can't cut and paste quotes directly from the texts.

But the big question here-- who is going to assess this, and on what basis? Somehow I don't think it's going to be a big room full of people who know both their mythology and their Sexton.

PAGE FOURTEEN: ABIGAIL ADAMS

So now we're on to biography. It's a selection from the National Women's History Museum, so you know it is going to be a vibrant and exciting text. I suppose it could be worse--we could be reading from an encyclopedia.

The questions want to know what "advocate for women" means, and to pick an example of Adams being an advocate. In other words, the kinds of questions that my students would immediately id as questions that don't require them to actually read the selection.

PAGE FIFTEEN: ADAMS

This page wants to know which question goes unanswered by the selection, and then for Part B asks to select a statement that is true about the biography but which supports the answer for A. Not hopelessly twisty.

PAGE SIXTEEN: MORE BIO

Connect the two central ideas of this selection. So, figure out what the writers believe are the two main ideas, and then try figure out what they think the writers see as a connection. Like most of these questions, these will be handled backwards. I'm not going to do a close reading of the selection-- I'm going to close read the questions and answers and then use the selection just as a set of clues about which answer to pick. And this is how answering multiple choice questions about a short selection is a task not much like authentic reading or pretty much any other task in the world.

PAGE SEVENTEEN: ABIGAIL LETTER

Now we're going to read the Adams family mail. This is one of her letters agitating for the rights of women; our questions will focus on her use of "tyrant" based entirely on the text itself, because no conversation between Abigail and John Adams mentioning tyranny in 1776 could possibly be informed by any historical or personal context.

PAGE EIGHTEEN: STILL VIOLATING FOUNDING FATHER & MOTHER PRIVACY

Same letter. Now I'm supposed to decide what the second graph most contributes to the text as a whole. Maybe I'm just a Below basic kind of guy, but I am pretty sure that the correct answer is not among the four choices. That just makes it harder to decide which other two paragraphs expand on the idea of graph #2.

PAGE NINETEEN: BOSTON

Now we'll decide what her main point about Boston is in the letter. This is a pretty straightforward and literal reading for details kind of question. Maybe the PARCC folks are trying to boost some morale on the home stretch here.

Oh hell. I have a message telling me I have less than five minutes left.

PAGE TWENTY: JOHN'S TURN

Now we have to pick the paraphrase of a quote from Adams that the test writers think is the berries. Another set of questions that do not require me to actually read the selection, so thank goodness for small favors.

PAGE TWENTY-ONE: MORE JOHN

Again, interpretation and support. Because making sense out of colonial letter-writing English is just like current reading. I mean, we've tested me on a boring general science piece, classical poetry, modern poetry, and a pair of colonial letters. Does it seem like that sampling should tell us everything there is to know about the full width and breadth of student reading ability?

PAGE TWENTY-TWO: BOTH LETTERS

Again, in one page, we have two sets of scrollers, tabs for toggling between works, and drag and drop boxes for the answers. Does it really not occur to these people that there are students in this country who rarely-if-ever lay hands on a computer?

This is a multitask page. We're asking for a claim made by the writer and a detail to back up that claim, but we're doing both letters on the same page and we're selecting ideas and support only from the options provided by the test. This is not complex. It does not involve any special Depth of Knowledge. It's just a confusing mess.

PAGE TWENTY-THREE: FINAL ESSAY

Contrast the Adams' views of freedom and independence. Support your response with details from the three sources (yes, we've got three tabs now). Write it in this tiny text box.

Do you suppose that somebody's previous knowledge of John and Abigail and the American Revolution might be part of what we're inadvertently testing here? Do you suppose that the readers who grade these essays will themselves be history scholars and writing instructors? What, if anything, will this essay tell us about the student's reading skills?

DONE

Man. I have put this off for a long time because I knew it would give me a rage headache, and I was not wrong. How anybody can claim that the results from a test like this would give us a clear, nuanced picture of student reading skills is beyond my comprehension. Unnecessarily complicated, heavily favoring students who have prior background knowledge, and absolutely demanding that test prep be done with students, this is everything one could want in an inauthentic assessment that provides those of us in the classroom with little or no actual useful data about our students.

If this test came as part of a packaged bunch of materials for my classroom, it would go in the Big Circular File of publishers materials that I never, ever use because they are crap. What a bunch of junk. If you have stuck it out with me here, God bless you. I don't recommend that you give yourself the full PARCC sample treatment, but I heartily recommend it to every person who declares that these are wonderful tests that will help revolutionize education. Good luck to them as well.



Saturday, February 7, 2015

Aldeman in NYT: Up Is Down

In Friday's New York Times, Chad Aldeman of Bellwether offered a defense of annual testing that is a jarring masterpiece of backwards speak, a string of words that are presented as if they mean the opposite of what they say. Let me hit the highlights.

The idea of less testing with the same benefits is alluring.

Nicely played, because it assumes that we are getting some benefits out of the current annual testing. We are not. Not a single one. The idea of less testing is alluring because the Big Standardized Test is a waste of time, and less testing means less time wasting.

Yes, test quality must be better than it is today.

Other than that, Mrs. Lincoln, how did you like the play. Again, this assumes that there is some quality in the tests currently being used. There is not. They don't need to be improved. They need to be scrapped.

And, yes, teachers and parents have a right to be alarmed when unnecessary tests designed only for school benchmarking or teacher evaluations cut into instructional time. 

A mishmosh of false assumptions. First, there are no "necessary" tests, nor have I ever read a convincing description of what a "necessary" test would be nor what would make it "necessary." And while there are no Big Standardized Tests that are actually designed for school benchmarking and teacher evaluation, in many states that is the only purpose of the BS Test! The only one! So in Aldeman's view, would those tests be okay because they are being used for purposes for which they aren't designed?

But annual testing has tremendous value. It lets schools follow students’ progress closely, and it allows for measurement of how much students learn and grow over time, not just where they are in a single moment.

Wait! What? A test is, in fact, single snapshot from a single day or couple of days-- that doesn't just give a picture of where students are at a single moment? Taking a single moment from four or five consecutive years does not let anybody follow students progress closely. This style of measurement is great for measuring student height-- and nothing else. This is like saying that the best way to assess the health of your marriage is to give your spouse a quiz one day a year.

Aldeman follows with several paragraphs pushing the disagregation argument-- that by forcing schools to measure particular groups, somebody somewhere gets a better picture of how the school is doing. It is, as always, unclear who needs this picture. You're the parent of a child in one of the groups. You believe your child is getting a good education or a bad education based on what you know about your child. How does getting disagregated data from the school change your understanding?

Besides, I thought we said a few paragraphs back that tests for measuring the school were bad and to be thrown out?

And of course that entire argument rests on the notion that the BS Test measures educational quality and there is not a molecule of evidence out there that it does so. Not. One. Molecule.

Coincidentally, the push for limiting testing has sprung up just as we’re on the cusp of having new, better tests. The Obama administration has invested $360 million and more than four years in the development of new tests, which will debut this spring. Private testing companies have responded with new offerings as well.

Oh, bullshit. New, better tests have been coming every year for a decade. They have never arrived. They will never arrive. It is not possible to create a mass-produced, mass-graded, standardized test that will measure the educational quality of every school in the country. It is like trying to use a ruler to measure the weight of a fluid-- I don't care how many times you go back to drawing board with the ruler-- it will never do the job. Educational quality cannot be measured by a standardized test. It is the wrong tool for the job, and no amount of redesign will change that.

Good reminder though that while throwing money at public schools is terrible and stupid, throwing money at testing companies is guaranteed awesome.

Annual standardized testing measures one thing-- how well a group of students does at taking an annual standardized test. That's it. Even Aldeman here avoids saying what exactly it is that these tests (you know, the "necessary ones") are supposed to measure.

Annual standardized testing is good for one other thing-- making testing companies a buttload of money. Beyond that, they are simply a waste of time and effort.