CURMUDGUCATION: test prep

Showing posts with label test prep. Show all posts

Wednesday, March 2, 2016

Ace That Test? I Think Not.

The full court press for the Big Standardized Test is on, with all manner of spokespersons and PR initiatives trying to convince Americans to welcome the warm, loving embrace of standardized testing. Last week the Boston Globe brought us Yana Weinstein and Megan Smith, a pair of psychology assistant professors who have co-founded Learning Scientists, which appears to be mostly a blog that they've been running for about a month. And say what you like-- they do not appear to be slickly or heavily funded by the Usual Gang of Reformsters.

Their stated goals include lessening test anxiety and decreasing the negative views of testing. And the reliably reformy Boston Globe gave them a chance to get their word out. Additionally, the pair blogged about additional material that did not make it through the Globe's edit.

The Testing Effect

Weinstein and Smith are fond of "the testing effect" a somewhat inexact term used to refer to the notion that recalling information helps people retain it. It always makes me want a name for whatever it is that makes some people believe that the only situation in which information is recalled is a test. Hell, it could be called the teaching effect, since we can get the same thing going by having students teach a concept to the rest of the class. Or the writing effect, or the discussion effect. There are many ways to have students sock information in place by recalling it; testing is neither the only or the best way to go about it.

Things That Make the Learning Scientists Feel Bad

From their blog, we learn that the LS team feels "awkward" when reading anti-testing writing, and they link to an example from Diane Ravitch. Awkward is an odd way to feel, really. But then, I think their example of a strong defense of testing is a little awkward. They wanted to quote a HuffPost pro-testing piece from Charles Coleman that, they say, addresses problems with the opt out movement "eloquently."

"To put it plainly: white parents from well-funded and highly performing areas are participating in petulant, poorly conceived protests that are ultimately affecting inner-city blacks at schools that need the funding and measures of accountability to ensure any hope of progress in performance." -- Charles F. Coleman Jr.

Ah. So opt outers are white, rich, whiny racists. That is certainly eloquent and well-reasoned support of testing. And let's throw in the counter-reality notion that testing helps poor schools, though after over a decade of test-driven accountability, you'd think supporters could rattle off a list of schools that A) nobody knew were underfunded and underresourced until testing and B) received an boost through extra money and resources after testing. Could it be that no such list actually exists?

Tests Cause Anxiety

The LS duo wants to decrease test anxiety by hammering students with testing all the time, so that it's no longer a big deal. I believe that's true, but not a good idea. Also, parents and teachers should stop saying bad things about the BS Tests, but just keep piling on the happy talk so that students can stop worrying and learn to love the test. All of this, of course, pre-supposes that the BS Tests are actually worthwhile and wonderful and that all the misgivings being expressed by professional educators and the parents of the children is-- what? An evil plot? Widespread confusion? The duo seem deeply committed to not admitting that test critics have any point at all. Fools, the lot of them.

Teaching to the Test

The idea that teaching to a test isn’t really teaching implies an almost astounding assumption that standardized tests are filled with meaningless, ill-thought-out questions on irrelevant or arbitrary information. This may be based on the myth that “teachers in the trenches” are being told what to teach by some “experts” who’ve probably never set foot in a “real” classroom.

Actually, it's neither "astounding" nor an "assumption," but, at least in the case of this "defiant" teacher (LS likes to use argument by adjective), my judgment of the test is based on looking at the actual test and using my professional judgment. It's a crappy test, with poorly-constructed questions that, as is generally the case with a standardized test, mostly test the student's ability to figure out what the test manufacturer wants the student to choose for an answer (and of course the fact that students are selecting answers rather than responding to open ended prompts further limits the usefulness of the BS Test).

But LS assert that tests are actually put together by testing experts and well-seasoned real teachers (and you can see the proof in a video put up by a testing manufacturer about how awesome that test manufacturer is, so totally legit). LS note that "defiant teachers" either "fail to realize" this or "choose to ignore" it. In other words, teachers are either dumb or mindlessly opposed to the truth.

Standardized Tests Are Biased

The team notes that bias is an issue with standardized tests, but it's "highly unlikely" that classroom teachers could do any better, so there. Their question-- if we can't trust a big board of experts to come up with an unbiased test, how can we believe that an individual wouldn't do even worse, and how would we hold them accountable?

That's a fair question, but it assumes some purposes for testing that are not in evidence. My classroom tests are there to see how my students have progressed with and grasped the material. I design those materials with my students in mind. I don't, as BS Tests often do, assume that "everybody knows about" the topic of the material, because I know the everybody's in my classroom, so I can make choices accordingly. I can also select prompts and test material that hook directly into their culture and background.

In short, BS Testing bias enters largely because the test is designed to fit an imaginary Generic Student who actually represents the biases of the test manufacturers, while my assessments are designed to fit the very specific group of students in my room. BS Tests are one-size-fits-all. Mine are tailored to fit.

Reformsters may then say, "But if yours are tailored to fit, how can we use them to compare your students to students across the nation." To which I say, "So what?" You'll need to convince me that there is an actual need to closely compare all students in the nation.

Tests Don't Provide Prompt Feedback

The duo actually agree that test "have a lot of room for improvement." They even acknowledge that the feedback from the test is not only late, but generally vague and useless. But hey-- tests are going to be totes better when they are all online, an assertion that makes the astonishing assumption that there is no difference between a paper test and a computer test except how the students record their answers.

Big Finish

The wrap up is a final barrage of Wrong Things.

Standardized tests were created to track students’ progress and evaluate schools and teachers.

Were they? Really? Is it even possible to create a single test that can actually be used for all those purposes? Because just about everyone on the planet not financially invested in the industry has pointed out that using test results to evaluate teachers via VAM-like methods is baloney. And tests need to be manufactured for a particular purpose-- not three or four entirely different ones. So I call shenanigans-- the tests were not created to both measure and track all three of those things.

Griping abounds about how these tests are measuring the wrong thing and in the wrong way; but what’s conspicuously absent is any suggestion for how to better measure the effect of education — i.e., learning — on a large scale.

A popular reformster fallacy. If you walk into my hospital room and say, "Well, your blood pressure is terrible, so we are going to chop off your feet," and then I say, "No, I don't want you to chop off my feet. I don't believe it will help, and I like my feet," your appropriate response is not, "Well, then, you'd better tell me what else you want me to chop off instead.

In other words, what is "conspicuously absent" is evidence that there is a need for or value in measuring the effects of education on a large scale. Why do we need to do that? If you want to upend the education system for that purpose, the burden is on you to prove that the purpose is valid and useful.

In the absence of direct measures of learning, we resort to measures of performance.

Since we can't actually measure what we want to measure, we'll measure something else as a proxy and talk about it as if it's the same thing. That is one of the major problems with BS Testing in a nutshell.

And the great thing is: measuring this learning actually causes it to grow.

And weighing the pig makes it heavier. This is simply not true, "testing effect" notwithstanding.

PS

Via the blog, we know that they wanted to link to this post at Learning Spy which has some interesting things to say about the difference between learning and performance, including this:

And students are skilled at mimicking what they think teachers want to see and hear. This mimicry might result in learning but often doesn’t.

That's a pretty good explanation of why BS Tests are of so little use-- they are about learning to mimic the behavior required by test manufacturers. But the critical difference between that mimicry on a test and in my classroom is that in my classroom, I can watch for when students are simply mimicking and adjust my instruction and assessment accordingly. A BS Tests cannot make any such adjustments, and cannot tell the difference between mimicry and learning at all.

The duo notes that their post is "controversial," and it is in the sense that it's more pro-test baloney, but I suspect that much of their pushback is also a reaction to their barely-disguised disdain for classroom teachers who don't agree with them. They might also consider widening their tool selection ("when your only tool is a hammer, etc...") to include a broader range of approaches beyond the "test effect." It's a nice trick, and it has its uses, but it's a lousy justification for high stakes BS Testing.

Wednesday, August 26, 2015

TN: We've Found the Unicorn Farm

Tennessee officials are cheerfully announcing the advent of awesome new assessments that will be "not a test you can game." The tests will be delivered at the end of the year by winged unicorns pooping rainbows while playing a Brahms lullabye on the spoons.

“We’re moving into a better test that will provide us better information about how well our students are prepared for post-secondary,” Tennessee Education Commissioner Candice McQueen told reporters recently during sneak peek at some of the questions.

Fans of the new standardized tests continue to be impressed at how these tests solve the problems of education in the 1970's. Assistant state commissioner of data and research Nokia Townes says that the tests will require "more than rote memorization," which puts them on a par with the tests I started giving my students thirty-five years ago. But that statement also indicates that officials still don't understand what "gaming the test" means.

Standardized tests are, by their nature, games. Where you have multiple choice questions, you have one correct answer and several other answers designed to trick and trap students who might make a particular mistake, so by their nature, they are not simply trying to capture a particular correct behavior, but are also testing for several incorrect ones.

But Tennessee officials are distracted themselves, focusing on unimportant test features. The questions have drop and drag! Which is, of course, simply a bubble test question with dragging instead of clicking or bubbling. The questions require multiple correct answers! Which is just a bubble test with more options and two bubbles to hit.

Because standardized tests are a game, gaming them will always be possible. The tests involve plenty of tricks and traps, and so we teach students how to identify those and spot when testers are trying to sucker them in a particular way. We learn specialized testing vocabulary (this is what "mood" means to test manufacturers). We learn what sorts of things to scan for in response to certain types of questions. In short, we teach students a variety of skills that have no application other than taking the Big Standardized Test.

Nothing has changed, really. The advantage of a standardized multiple choice test is that it can be scored quickly and cheaply. The problem is that it can't measure much of any depth. At its worst in the bad old days when Hector and I were pups, it measured recall. In the new and improved days since, it measures whether or not the student will fall for particular tricks and traps-- which may or may not have anything to do with how well the student understands and applies the understanding. And since our focus under the Core is almost entirely on performing certain operations and not at all on content, we're not really testing a student on what she knows, but on whether she can perform the required trick (of course, we're also testing whether or not she's willing to perform the trick, but we never, ever have that discussion).

The article says that the "oldest and most potent criticism of tests" is that "they force teachers to 'teach to the test' and focus unduly on memorizing facts and testing tricks that students promptly forget after completing the test."

Memorizing facts? Maybe. Testing tricks? Those will always and forever be part of test prep for standardized tests because those tests must be speedy and cheap, and the only way to do testing speedy and cheap is by building it out of testing tricks. And any testing tricks that manufacturers can use to create a test, students can learn to take it.

Monday, February 9, 2015

6 Testing Talking Points

Anthony Cody scored a great little handout last week that is a literal guide to how reformster want to talk about testing. The handout-- "How To Talk About Testing"-- covers six specific testing arguments and how reformsters should respond to them, broken down into finding common ground, pivoting to a higher emotional place, do's, don'ts, rabbit holes to avoid, and handy approaches for both parents and business folks. Many of these talking points will seem familiar.

But hey-- just because something is a talking point doesn't mean that it's untrue. Let's take a look:

Argument: There's too much testing

Advice: You can't win this one because people mostly think it's true (similar to the way that most people think the earth revolves around the sun). But you can pivot back with the idea that newer, better Common Core tests will fix that, somehow, and also "parents want to know how their kids are doing and they need a [sic] objective measuring stick."

We've been waiting for these newer, better tests for at least a decade. They haven't arrived and they never will. And aren't parents yet tired of the assertion that they are too dopey to know how their children are doing unless a standardized test tells them? How can this still be a viable talking point? Also, objective measuring sticks are great-- unless you're trying to weigh something or measure the density of a liquid or check photon direction in a quantum physics experiment. Tests may well be measuring sticks-- but that doesn't mean they're the tool for the job.

Do tell parents that the new tests will make things better, but don't overpromise (because the new tests won't make a damn bit of difference). Do tell parents to talk to the teacher, but don't encourage them to get all activisty because ~~that would cramp our style~~ because that will probably scare them, poor dears.

And tell business guys that we're getting lots of accountability bang for our buck. Because who cares if it's really doing the job as long as it's cheap?

Argument: We can't treat schools like businesses

Advice: People don't want to think of schools as cutthroat, but tell them we need to know if the school is getting results. "Parents have a right to know if their kids are getting the best education they can." Then, I guess, cross your fingers and hope that parents don't ask, "So what does this big standardized test have to do with knowing if my child is getting a great education?"

People want results and like accountability (in theory). "Do normalize the practice of measuring performance." Just don't let anybody ask how exactly a standardized test measures the performance of a whole school. But do emphasize how super-important math and reading are, just in case anyone wants to ask how the Big Standardized Test can possibly measure the performance of every other part of the school.

At the same time, try not to make this about the teachers and how their evaluation system is completely out of whack thanks to the completely-debunked idea of VAM (this guide does not mention value-added). Yes, it measures teacher performance, but gosh, we count classroom observation, too. "First and foremost the tests were created to help parents and teachers know if a student is reading and doing math at the level they should."

Yikes-- so many questions should come up in response to this. Like, we've now been told multiple reasons for the test to be given-- is it possible to design a single test that works for all those purposes? Or, who decides what level the students "should" be achieving?

The writer wants you to know that the facts are on your side, because there's a 2012 study that shows a link between 7 year old reading and math ability and social class thirty-five years later. From the University of Edinburgh. One more useful talking point to use on people who don't understand the difference between correlation and causation.

Argument: It's just more teaching to the test

Advice: A hailstorm of non-sequitors. You should agree with them that teaching to the test is a waste of time, but the new tests are an improvement and finally provide parents with valuable information.

Okay, so not just non-sequitors, but also Things That Aren't True. The writer wants you to argue essentially that new generation tests are close to authentic assessment (though we don't use those words), which is baloney. We also recycle the old line that these tests don't just require students to fill in the blanks with facts they memorized last week. Which is great, I guess, in the same way that tests no longer require students to dip their pens in inkwells.

As always, the test prep counter-argument depends on misrepresenting what test prep means. Standardized tests will always require test prep, because any assessment at all is a measure of tasks that are just like the assessment. Writing an essay is an assessment of how well a student can write an essay. Shooting foul shots is a good assessment of how well a player can shoot foul shots. Answering standardized test questions is an assessment of how well a student answers standardized test questions, and so the best preparation for the test will always be learning to answer similar sorts of test questions under similar test-like conditions, aka test prep.

The business-specific talking point is actually dead-on correct-- "What gets measured gets done!" And what gets measured with a standardized test is the ability to take a standardized test, and therefor teachers and schools are highly motivated to teach students how to take a standardized tests. (One might also ask what implications WGMGD has for all the subjects that aren't math and reading.)

The suggestion for teacher-specific message is hilarious-- "The new tests free teachers to do what they love: create a classroom environment that's about real learning, teaching kids how to get to the answer, not just memorize it." And then after school the children can pedal home on their pennyfarthings and stop for strawberry phosphates.

Argument: One size doesn't fit all

This is really the first time the sheet resorts to a straw man, saying of test opponents that "they want parents to feel that their kids are too unique for testing." Nope (nor can one be "too unique" or "more unique" or "somewhat pregnant"). I don't avoid one-size-fits-all hats because I think I'm too special; I just know that they won't fit.

But the advice here is that parents need to know how their kids are doing at reading and math because all success in life depends on reading and math. And they double down on this as well:

There are many different kinds of dreams and aspirations, with one way to get there: reading and math... There isn't much you can do without reading and math... Without solid reading and math skills, you're stuck

And, man-- I am a professional English teacher. It is what I have devoted my life to. But I'll be damned if I would stand in front of any of my classes, no matter how low in ability, and say to them, "You guys read badly, and you are all going to be total failures in life because you are getting a lousy grade in my class." I mean-- I believe with all my heart that reading and writing are hugely important skills, but even I would not suggest that nobody can amount to anything in life without them.

Then there's this:

It's not about standardization. Quite the opposite. It's about providing teachers with another tool, getting them the information they need so they can adapt their teaching and get your kids what they need to reach their full potential.

So here's yet another alleged purpose for the test, on top of the many others listed so far. This is one magical test, but as a parent, I would ask just one question-- When will the test be given, and when will my child's teacher get back the results that will inform these adaptations? As a teacher, I might ask how I'll get test results that will both tell me what I have yet to do this year AND how well I did this year. From the same test! Magical, I'm telling you!

Argument: A drop in scores is proof

I didn't think the drop in test scores was being used as proof of anything by defenders of public ed. We know why there was a drop-- because cut scores were set to insure it.

Advice: present lower test scores as proof of the awesomeness of these new, improved tests. But hey-- look at this:

We expected the drop in scores. Any time you change a test scores drop. We know that. Anything that's new has a learning curve.

But wait. I thought these new improved tests didn't require any sort of test prep, that they were such authentic measures of what students learn in class that students would just transfer that learning seamlessly to the new tests. Didn't you say that? Because it sounds now like students need a few years to get the right kind of test preparation do well on these.

Interesting don'ts on this one--don't trot out the need to have internationally competitive standards to save the US economy with college and career ready grads.

Argument: Testing is bad. Period.

Advice: Yes, tests aren't fun. They're not supposed to be. But tests are a part of life. "They let us know we're ready to move on." So, add one more item to the Big List of Things The Test Can Do.

Number one thing to do? Normalize testing. Tests are like annual checkups with measures for height and weight, which I guess is true if all the short kids are flunked and told they are going to fail at life and then the doctors with the most short kids get paid less by the insurance company and given lower ratings. In that case then, yes, testing is just like a checkup.

The writer wants you to sell the value of information, not the gritty character-building experience of testing. It's a good stance because it assumes the sale-- it assumes that the Big Standardized Test is actually collecting real information that means what it says it means, which is a huge assumption with little evidence to back it up.

Look, testing is not universal. Remember when you had to pass your pre-marital spousing test before you could get married, or the pre-parenting test before you could have kids? No, of course not. Nor do CEO's get the job by taking a standardized test that all CEO's must take before they can be hired.

Where testing does occur, it occurs because it has proven to have value and utility. Medical tests are selected because they are deemed appropriate for the specific situation by medical experts, who also have reason to believe that the tests deliver useful information.

Of all the six points, this one is the most genius because it complete skips past the real issue. There are arguments to be made against all testing (Alfie Kohn makes the best ones), but in a world where tests are unlikely to be eradicated, the most important question is, "Is this test any good?" All tests are not created equal. Some are pretty okay. Some are absolute crap. Distinguishing between them is critical.

So there are our six testing talking points. You can peruse the original to find more details-- they're very peppy and have snappy layouts and fonts. They are baloney, but it's baloney in a pretty wrapper in small, easy-to-eat servings. But still baloney.

Sunday, February 8, 2015

Sampling the PARCC

Today, I'm trying something new. I've gotten myself onto the PARCC sample item site and am going to look at the ELA sample items for high school. This set was updated in March of 2014, so, you know, it's entirely possible they are not fully representative, given that the folks at Pearson are reportedly working tirelessly to improve testing so that new generations of Even Very Betterer Tests can be released into the wild, like so many majestic lion-maned dolphins.

So I'm just going to live blog this in real-ish time, because we know that one important part of measuring reading skill is that it should not involve any time for reflection and thoughtful revisiting of the work being read. No, the Real Readers of this world are all Wham Bam Thank You Madam Librarian, so that's how we'll do this. There appear to be twenty-three sample items, and I have two hours to do this, so this could take a while. You've been warned.

PAGE ONE: DNA

Right off the bat I can see that taking the test on computer will be a massive pain in the ass. Do you remember frames, the website formatting that was universally loathed and rapidly abandoned? This reminds me of that. The reading selection is in its own little window and I have to scroll the reading within that window. The two questions run further down the page, so when I'm looking at the second question, the window with the selection in it is halfway off the screen, so to look back to the reading I have to scroll up in the main window and then scroll up and down in the selection window and then take a minute to punch myself in the brain in frustration.

The selection is about using DNA testing for crops, so fascinating stuff. Part A (what a normal person might call "question 1") asks us to select three out of seven terms used in the selection, picking those that "help clarify" the meaning of the term "DNA fingerprint," so here we are already ignoring the reader's role in reading. If I already understand the term, none of them help (what helped you learn how to write your name today?), and if I don't understand the term, apparently there is only one path to understanding. If I decide that I have to factor in the context in which the phrase is used, I'm back to scrolling in the little window and I rapidly want to punch the test designers in the face. I count at least four possible answers here, but only three are allowed. Three of them are the only answers to use "genetics" in the answer; I will answer this question based on guesswork and trying to second guess the writer.

Part B is a nonsense question, asking me to come up with an answer based on my first answer.

PAGE TWO: STILL FRICKIN' DNA

Still the same selection. Not getting any better at this scrolling-- whether my mouse roller scrolls the whole page or the selection window depends on where my cursor is sitting.

Part A is, well... hmm. If I asked you, "Explain how a bicycle is like a fish," I would expect an answer from you that mentioned both the bicycle and a fish. But PARCC asks how "solving crop crimes is like solving high-profile murder cases." But all four answers mention only the "crop crime" side of the comparison, and the selection itself says nothing about how high-profile murder cases are solved. So are students supposed to already know how high-profile murder cases are solved? Should they assume that things they've seen on CSI or Law and Order are accurate? To answer this we'll be reduced to figuring out which answer is an accurate summary of the crop crime techniques mentioned in the selection.

This is one of those types of questions that we have to test prep our students for-- how to "reduce" a seemingly complex question to the simpler question. This question pretends to be complex; it is actually asking, "Which one of these four items is actually mentioned in the selection?" It boils down to picky gotcha baloney-- one answer is going to be wrong because it says that crop detectives use computers "at crime scenes"

Part B.The old "which detail best supports" question. If you blew Part A, these answers will be bizarrely random.

PAGE THREE: DNA

Still on this same damn selection. I now hate crops and their DNA.

Part A wants to know what the word "search" means in the heading for the final graph. I believe it means that the article was poorly edited, but that selection is not available. The distractor in this set is absolutely true; it requires test-taking skills to eliminate it, not reading skills.

Part B "based on information from the text" is our cue (if we've been properly test prepped) to go look for the answer in the text, which would take a lot less time if not for this furshlugginer set up. The test writers have called for two correct answers, allowing them to pretend that a simple search-and-match question is actually complex.

PAGE FOUR: DNA GRAND FINALE, I HOPE

Ah, yes. A test question that assesses literally nothing useful whatsoever. At the top of the page is our selection in a full-screen width window instead of the narrow cramped one. At the bottom of the page is a list of statements, two of which are actual advantages of understanding crop DNA. Above them are click-and-drag details from the article. You are going to find the two advantages, then drag the supporting detail for each into the box next to it. Once you've done all this, you will have completed a task that does not mirror any real task done by real human beings anywhere in the world ever.

This is so stupid I am not even going to pretend to look for the "correct" answer. But I will remember this page clearly the next time somebody tries to unload the absolute baloney talking point that the PARCC does not require test prep. No students have ever seen questions like this unless a teacher showed them such a thing, and no teacher ever used such a thing in class unless she was trying to get her students ready for a cockamamie standardized test.

Oh, and when you drag the "answers," they often don't fit in the box and just spill past the edges, looking like you've made a mistake.

PAGE FIVE: FOR THE LOVE OF GOD, DNA

Here are the steps listed in the article. Drop and drag them into the same order as in the article. Again, the only thing that makes this remotely difficult is wrestling with the damn windows. This is a matching exercise, proving pretty much nothing.

PAGE SIX: APPARENTLY THIS IS A DNA TEST TEST

By now my lower-level students have stopped paying any attention to the selection and are just trying to get past it to whatever blessed page of the test will show them something else.

Part A asks us to figure out which question is answered by the selection. This is one of the better questions I've seen so far. Part B asks which quote "best" supports the answer for A. I hate these "best" questions, because they reinforce the notion that there is only one immutable approach for any given piece of text. It's the very Colemanian idea that every text represents only a single destination and there is only one road by which to get there. That's simply wrong, and reinforcing it through testing is also wrong. Not only wrong, but a cramped, tiny, sad version of the richness of human understanding and experience.

PAGE SEVEN: SOMETHING NEW

Here comes the literature. First we get 110 lines of Ovid re: Daedelus and Icarus (in a little scrolling window). Part A asks which one of four readings is the correct one for lines 9 and 10 (because reading, interpreting and experiencing the richness of literature is all about selecting the one correct reading). None of the answers are great, particularly if you look at the lines in context, but only one really makes sense. But then Part B asks which other lines support your Part A answer and the answer here is "None of them," though there is one answer for B that would support one of the wrong answers for A, so now I'm wondering if the writers and I are on a different page here.

PAGE EIGHT: STILL OVID

Two more questions focusing on a particular quote, asking for an interpretation and a quote to back it up. You know, when I say it like that, it seems like a perfectly legitimate reading assessment. But when you turn that assessment task into a multiple choice question, you break the whole business. "Find a nice person, get married and settle down," seems like decent-ish life advice, but if you turn it into "Select one of these four people, get married in one of these four ceremonies, and buy one of these four houses" suddenly it's something else.

And we haven't twisted this reading task for the benefit of anybody except the people who sell, administer, score and play with data from these tests.

PAGE NINE: OVID

The test is still telling me that I'm going to read two selections but only showing me one. If I were not already fully prepped for this type of test and test question, I might wonder if something were wrong with my screen. So, more test prep required.

Part A asks what certain lines "most" suggest about Daedelus, as if that is an absolute objective thing. Then you get to choose what exact quotes (two, because that makes it more complex) back you up. This is not constructing and interpretation of a piece of literature. Every one of these questions makes me angrier as a teacher of literature and reading.

PAGE TEN: ON TO SEXTON

Here's our second poem-- "To a Friend Whose Work Has Come To Triumph." The two questions are completely bogus-- Sexton has chosen the word "tunneling" which is a great choice in both its complexity and duality of meaning, a great image for the moment she's describing. But of course in test land the word choice only "reveals" one thing, and only one other piece of the poem keys that single meaning. I would call this poetry being explained by a mechanic, but that's disrespectful to mechanics.

PAGE ELEVEN: MORE BUTCHERY

Determine the central idea of Sexton's poem, as well as specific details that develop the idea over the course of the poem. From the list of Possible Central Ideas, drag the best Central Idea into the Central Idea box.

Good God! This at least avoids making explicit what is implied here-- "Determine the central idea, then look for it on our list. If it's not there, you're wrong." Three of the four choices are okay-ish, two are arguable, and none would impress me if they came in as part of a student paper.

I'm also supposed to drag-and-drop three quotes that help develop the One Right Idea. So, more test prep required.

PAGE TWELVE: CONTRAST

Now my text window has tabs to toggle back and forth between the two works. I'm supposed to come up with a "key" difference between the two works (from their list of four, of course) and two quotes to back up my answer. Your answer will depend on what you think "key" means to the test writers. Hope your teacher did good test prep with you.

PAGE THIRTEEN: ESSAY TIME

In this tiny text box that will let you view about six lines of your essay at a time, write an essay "that provides and analysis of how Sexton transforms Daedelus and Icarus." Use evidence from both texts. No kidding-- this text box is tiny. And no, you can't cut and paste quotes directly from the texts.

But the big question here-- who is going to assess this, and on what basis? Somehow I don't think it's going to be a big room full of people who know both their mythology and their Sexton.

PAGE FOURTEEN: ABIGAIL ADAMS

So now we're on to biography. It's a selection from the National Women's History Museum, so you know it is going to be a vibrant and exciting text. I suppose it could be worse--we could be reading from an encyclopedia.

The questions want to know what "advocate for women" means, and to pick an example of Adams being an advocate. In other words, the kinds of questions that my students would immediately id as questions that don't require them to actually read the selection.

PAGE FIFTEEN: ADAMS

This page wants to know which question goes unanswered by the selection, and then for Part B asks to select a statement that is true about the biography but which supports the answer for A. Not hopelessly twisty.

PAGE SIXTEEN: MORE BIO

Connect the two central ideas of this selection. So, figure out what the writers believe are the two main ideas, and then try figure out what they think the writers see as a connection. Like most of these questions, these will be handled backwards. I'm not going to do a close reading of the selection-- I'm going to close read the questions and answers and then use the selection just as a set of clues about which answer to pick. And this is how answering multiple choice questions about a short selection is a task not much like authentic reading or pretty much any other task in the world.

PAGE SEVENTEEN: ABIGAIL LETTER

Now we're going to read the Adams family mail. This is one of her letters agitating for the rights of women; our questions will focus on her use of "tyrant" based entirely on the text itself, because no conversation between Abigail and John Adams mentioning tyranny in 1776 could possibly be informed by any historical or personal context.

PAGE EIGHTEEN: STILL VIOLATING FOUNDING FATHER & MOTHER PRIVACY

Same letter. Now I'm supposed to decide what the second graph most contributes to the text as a whole. Maybe I'm just a Below basic kind of guy, but I am pretty sure that the correct answer is not among the four choices. That just makes it harder to decide which other two paragraphs expand on the idea of graph #2.

PAGE NINETEEN: BOSTON

Now we'll decide what her main point about Boston is in the letter. This is a pretty straightforward and literal reading for details kind of question. Maybe the PARCC folks are trying to boost some morale on the home stretch here.

Oh hell. I have a message telling me I have less than five minutes left.

PAGE TWENTY: JOHN'S TURN

Now we have to pick the paraphrase of a quote from Adams that the test writers think is the berries. Another set of questions that do not require me to actually read the selection, so thank goodness for small favors.

PAGE TWENTY-ONE: MORE JOHN

Again, interpretation and support. Because making sense out of colonial letter-writing English is just like current reading. I mean, we've tested me on a boring general science piece, classical poetry, modern poetry, and a pair of colonial letters. Does it seem like that sampling should tell us everything there is to know about the full width and breadth of student reading ability?

PAGE TWENTY-TWO: BOTH LETTERS

Again, in one page, we have two sets of scrollers, tabs for toggling between works, and drag and drop boxes for the answers. Does it really not occur to these people that there are students in this country who rarely-if-ever lay hands on a computer?

This is a multitask page. We're asking for a claim made by the writer and a detail to back up that claim, but we're doing both letters on the same page and we're selecting ideas and support only from the options provided by the test. This is not complex. It does not involve any special Depth of Knowledge. It's just a confusing mess.

PAGE TWENTY-THREE: FINAL ESSAY

Contrast the Adams' views of freedom and independence. Support your response with details from the three sources (yes, we've got three tabs now). Write it in this tiny text box.

Do you suppose that somebody's previous knowledge of John and Abigail and the American Revolution might be part of what we're inadvertently testing here? Do you suppose that the readers who grade these essays will themselves be history scholars and writing instructors? What, if anything, will this essay tell us about the student's reading skills?

DONE

Man. I have put this off for a long time because I knew it would give me a rage headache, and I was not wrong. How anybody can claim that the results from a test like this would give us a clear, nuanced picture of student reading skills is beyond my comprehension. Unnecessarily complicated, heavily favoring students who have prior background knowledge, and absolutely demanding that test prep be done with students, this is everything one could want in an inauthentic assessment that provides those of us in the classroom with little or no actual useful data about our students.

If this test came as part of a packaged bunch of materials for my classroom, it would go in the Big Circular File of publishers materials that I never, ever use because they are crap. What a bunch of junk. If you have stuck it out with me here, God bless you. I don't recommend that you give yourself the full PARCC sample treatment, but I heartily recommend it to every person who declares that these are wonderful tests that will help revolutionize education. Good luck to them as well.

Sunday, January 18, 2015

Testing: What Purposes?

As the Defenders of Big Standardized Tests have rushed to protect and preserve ~~this important revenue stream~~ this monster program, they have proposed a few gazillion reasons that testing must happen, that these big bubbly blunt force objects of education serve many important purposes.

The sheer volume of purported purposes makes it appear that BS Tests are almost magical. And yet, when we start working our way down the list and look at each purpose by itself...

Teacher Evaluation.

The notion that test results can be used to determine how much value a teacher added to an individual student (which is itself a creepy concept) has been debunked, disproven, and rejected by so many knowledgeable people it's hard to believe that anyone could still defend it. At this point, Arne Duncan would look wiser insisting that the earth is a giant flat disc on the back of a turtle. There's a whole argument to be had about what to do with teacher evaluations once we have them, but if we decide that we do want to evaluate teachers for whatever purpose, evaluations based on BS Tests do not even make the Top 100 list.

Inform Instruction: Micro Division

Can I use BS Tests to help me decide how to shape, direct and fine tune my classroom practices this year? Can I use the BS Tests results from the test given in March and sent back to us over the summer to better teach the students who won't be in my class by the time I can see their individual scores? Are you kidding me?

BS Tests are useless as methods of tuning and tweaking instruction of particular students in the same year. And we don't need a tool to do that any way because that's what teachers do every single day. I do dozens of micro-assessments on a daily basis, formal and informal, to determine just where my students stand on whatever I'm teaching. The notion that a BS Test can help with this is just bizarre.

Inform Instruction: Macro Division

Okay, so will year-to-year testing allow a school to say, "We need to tweak our program in this direction." The answer is yes, kind of. Many, many schools do this kind of study, and it boils down to coming together to say, "We've gotten as far as we can by actually teaching the subject matter. But test study shows that students are messing up this particular type of question, so we need to do more test prep--I mean, instructional focus, on answering these kinds of test questions."

But is giving every single student a BS Test every single year the best way to do this? Well, no. If we're just evaluating the program, a sampling would be sufficient. And as Catherine Gerwitz pointed out at EdWeek, this is one of many test functions that could already be handled by NAEP.

Measuring Quality for Accountability

It seems reasonable to ask the question, "How well are our schools doing, really?" It also seems reasonable to ask, "How good is my marriage, really?" or "How well do I kiss, really?" But if you imagine a standardized test is going to tell you, you're ready to buy swampland in Florida.

Here's a great article that addresses the issue back in 1998, before it was so politically freighted. That's more technical answer. The less technical answer is to ask-- when people wonder about how good a school is, or ask about schools, or brag about schools, or complain about schools, how often is that directly related to BS Tests results. When someone says, "I want to send my kids to a great school," does that question have anything to do with how well their kid will be prepped to take a narrow bubble test?

BS Tests don't measure school quality.

Competition Among Schools

"If we don't give the BS Test," opine some advocates, "how will we be able to stack rank all the schools of this country." (I'm paraphrasing for them).

The most obvious question here is, why do we need to? What educational benefit do I get in my 11th grade English classroom out of know how my students compare to students in Iowa? In what parallel universe would we find me saying either, "Well, I wasn't actually going to try to teach you anything, but now that I see how well they're doing in Iowa, I'm going to actually try" or "Well, we were going to do some really cool stuff this week, but I don't want to get too far ahead of the Iowans."

But even if I were to accept the value of intra-school competition, why would I use this tool, and why would I use it every year for every student? Again, the NAEP is already a better tool. The current crop of BS Tests cover a narrow slice of what schools do. Using these to compare schools is like making every single musician in the orchestra audition by playing a selection on oboe.

The Achievement Gap

We used to talk about making the pig fatter by repeatedly measuring it. Now we have the argument that if we repeatedly weight two pigs, they will get closer to weighing the same.

The data are pretty clear-- in our more-than-a-decade of test-based accountability, the achievement gap has not closed. In fact, in some areas, it has gotten wider. It seems not-particularly-radical to point out that doubling down on what has not worked is unlikely to, well, work.

The "achievement gap" is, in fact, a standardized test score gap. Of all the gaps we can find related to social justice and equity in our nation-- the income gap, the mortality gap, the getting-sent-to-prison gap, the housing gap, the health care gap, the being-on-the-receiving-end-of-violence gap-- of all these gaps, we really want to throw all our weight behind how well people score on the BS Tests?

Finding the Failures

Civil rights groups that back testing seem to feel that the BS Test and the reporting requirements of NCLB (regularly hailed as many people's favorite part of the law) made it impossible for schools and school districts to hide their failures. By dis-aggregating test results, we can quickly and easily see which schools are failing and address the issue. But what information have we really collected, and what are we actually doing about it?

We already know that the BS Tests correspond to family income. We haven't found out anything with BS Tests that we couldn't have predicted by looking at family income. And how have we responded? Certainly not by saying, "This school is woefully underfunded, lacking both the resources and the infrastructure to really educate these students." No, we can't do that. Instead we encourage students to show grit, or we offer us "failing" schools as turnaround/investment opportunities for privatizers. Remember-- you don't fix schools by throwing money at them. To fix schools, you have to throw money at charter operators.

Civil Rights

For me, this is the closest we come to a legit reason for BS Tests. Essentially, the civil rights argument is that test results provide a smoking gun that can be used to indict districts so steeped in racism that they routinely deny even the most rudimentary features of decent schooling.

But once again, it doesn't seem to work that way. First, we don't learn anything we didn't already know. Second, districts don't respond by trying to actually fix the problem, but simply by complying with whatever federal regulation demands, and that just turns into more investment opportunities. Name a school district that in the last decade of BS Testing has notably improved its service of minority and poor students because of test results. No, instead, we have districts where the influx of charter operations to fix "failing" schools has brought gentrification and renewed segregation.

BS Testing also replicates the worst side effect of snake oil cures-- it creates the illusion that you're actually working on the problem and keeps you from investing your resources in a search for real solutions.

Expectations

On the other hand, one of the dumbest supports of BS Testing is the idea, beloved by Arne Duncan, that expectations are the magical key to everything. Students with special needs don't perform well in school because nobody expects them to. So we must have BS Tests, and we must give them to everyone the same way. Also, in order to dominate the high jump in the next Olympics, schools will now require all students to clear a high jump bar set at 6' before they may eat lunch. That includes children who are wheelchair bound, because expectations.

Informing parents

Yes, somehow BS Test advocates imagine that parents have no idea how their children are doing in school unless they can see the results of a federally-mandated BS Test. The student's grades, the students daily tests and quizzes and writing assignments and practice papers provide no information. Nor could a parent actually speak to a teacher face to face or through e-mail to ask about their child's progress.

Somehow BS Test advocates imagine a world where teachers are incompetent and parents are clueless. Even if that is true in one corner or another, how, exactly, would a BS Test score help? How would a terrible teacher or a dopey parent use that single set of scores to do... anything? I can imagine there are places where parents want more transparency from their schools, but even so-- how do BS Tests, which measure so little and measure it so poorly, give them that?

Informing government

Without BS Testing, ask advocates, how will the federal government know how schools are doing? I have two questions in response.

1) What makes you think BS Tests will tell you that? Why not just the older, better NAEP test instead?

2) Why do the feds need to know?

Bottom Line

Many of the arguments for BS Testing depend on a non sequitor construction: "Nutrition is a problem in some countries, so I should buy a hat." Advocates start with a legitimate issue, like the problems of poverty in schools, and suggest BS Testing as a solution, even though it offers none.

In fact there's little that BS Tests can help with, because they are limited and poorly-made tools. "I need to nail this home together," say test advocates. "So hand me that banana." Tests simply can't deliver as advertised.

The arguments for testing are also backwards-manufactured. Instead of asking, "Of all the possible solutions in the world, how could we help a teacher steer instruction during the year," testing advocates start with the end ("We are going to give these tests") and then struggle to somehow connect those conclusions to the goal.

If you were going to address the problems of poverty and equity in this country, how would you do it? If you were going to figure out if someone was a good teacher or not, how could we tell that? How would you tell good schools from bad ones, and how would you fix the bad ones?

The first answer that pops into your mind for any of those questions is not, "Give a big computer-based bubble test on reading and math."

Nor can we say just give it a shot, because it might help and what does it really hurt? BS Tests come with tremendous costs, from the huge costs of the tests to the costs of the tech needed to administer them to the costs in a shorter school year and the human costs in stress and misery for the small humans forced to take these. And we have yet to see what the long-term costs are for raising a generation to think that a well-educated person is one who can do a good job of bubbling in answers on a BS Test.

The federal BS Test mandate needs to go away because the BS Testing does not deliver any of the outcomes that it promises and comes at too great costs.

Saturday, October 25, 2014

Test Prep Texts

Reformsters like to claim that the new generation of standards and tests have moved us beyond test prep. "No more rote memorization," they'll say. "Now we'll be testing critical thinky skills and depth of knowledge."

They are wrong on several counts. First, there are no standardized tests for critical thinking. Nor do I believe there ever will be. Let's consider the challenges to create a single such test question. We'd need to

1) design a deeply thinky, open-ended question that will play the same for every child from Florida to Alaska that generates

2) a million potentially divergent answers in a million different directions but which

3) can still be consistently mass-scored by a computer or army of low-skills test scorers. Plus

4) all this must be accomplished at a low cost so that whatever company doing it doesn't go broke.

Nothing that the testing industry has done in the history of ever would suggest that they have the slightest clue how to do this successfully. So what we get is a bunch of workarounds, cut corners, and plastic imitations of critical thought, such as questions where students must bubble in the correct piece of evidence one must be guided by, or more commonly, the one correct conclusion a good critical thinker must reach. Pro tip: if you expect every one of millions of human beings to answer your question with the same answer, it's not an open-ended question, and you aren't measuring critical thinking skills.

No, what we've got now is new tests that require more test prep rather than less. Here's why.

In prehistoric slate-and-charcoal tests, we would give the student a question such as "4 + 2 = ?" Because we had learned the conventions of that simple set-up, the student knew exactly what was being asked, and exactly how to answer it. The only test prep required was making sure that the student knew what you get when you add two and four.

The modern test problem is exemplified by a student I was once supervising in a ~~test prep~~ academic remediation class. He was working on a popular on-line ~~test prep~~ teaching program, and he had stopped, stared at the screen, typed, entered, stared some more, typed again-- rinse repeat, finally with lots of staring. I stepped up behind him; he was frustrated. "Do you need a hand with this problem?" I asked.

"No," he said. "I know what the answer is. I just don't know exactly how they want me to say it."

Meredith Broussard captured the issue masterfully in her Atlantic article last summer, "Why Poor Schools Can't Win at Standardized Testing." She concluded that the very best way to get students ready for the Big Test is to get them textbooks written by the same big three corporations that are producing the tests.

The most important test prep is getting students used to A) how the test will ask the questions and B) how the test wants students to answer the question. More complex (excuse me-- "rigorous") items just mean multiple ways to ask the question (and multiple ways to interpret the question that you ask) and multiple ways to answer the question.

So teachers are spending lots of time teaching students "When they ask X, what they're looking for is Y" as well as "When the want Y, they want you to say it like this." We practice reading short, context-free crappy excerpts, and then we learn what sorts of things the questions are rally looking for. We are doing more ~~test prep~~ carefully focused aligned instruction than ever.

If you're fortunate enough to be studying out of a Pearson text, you'll be ~~test prepped~~ educationally prepared for the Pearson test. If your students are studying out of some Brand X textbook, they won't be learning the Right Way to ask the question nor the Right Way to answer it. And what Broussard also revealed is that many large urban school districts (she was looking at Philly, but there's no reason to believe they are hugely unique in this respect) do not have the money to put the proper test prep books in front of their students.

It's just one more way that poor school systems get the shaft. Or if you're more conspiratorially minded, it's one more ways that large urban systems are set up for failure as a prelude to letting charter and private schools get their hands on all that sweet, sweet cash.

And even in less poor districts, test prep texts are a challenge. Remember-- if your books are more than about four years old, they probably aren't ~~giving good test prep~~ properly aligned with the tested standards. You need to replace them all, even if you're on a seven or ten year book replacement cycle.

Test prep is not only alive and well. It is more necessary, and more profitable, than ever.

Monday, October 6, 2014

Depth of Knowledge? You'll Need Hip Boots.

Have you met Webb's Depth of Knowledge in all its reformy goodness. I just spent a couple of blood pressure-elevating hours with it. Here's the scoop.

In Pennsylvania, our state department of education has Intermediate Units which are basically regional offices for the department. The IU's do some useful work, but they are also the mechanism by which the state pumps the Kool-Aid of the Week out into local districts.

Today my district hosted a pair of IU ladies today (IU reps are typically people who tried classroom teaching on for size and decided to move on to other things). As a courtesy, I'll refer to them as Bert and Ernie, because one was shorter are chirpier and the other has a taller frame and a lower voice. I've actually sat through DOK training before, but this was a bit clearer and direct (but not in a good way).

Why bother with DOK?

Bert and Ernie cleared this up right away. Here's what was written on one of the first slides in the presentation:

It's not fair to students if the first time they see a Depth of Knowledge 2 or 3 question is on a state test (PSSA or Keystone).

In other words, DOK is test prep.

Ernie showed us a pie chart breaking down the share of DOK 2 and 3 questions. She asked how we thought the state will assess DOK 4 questions? Someone went with the obvious "on the test" answer, and Ernie said no, that since DOK 4 questions take time, the Test "unfortunately" could not do that.

There was never any other reason. Bert and Ernie did not even attempt to pretend to make a case that attending to DOK would help students in life, aid their understand, or even improve their learning. This is test prep.

Where did it come from?

Webb (it's a person, not a piece of jargon) developed his DOK stuff in some sort of conjunction with CCSSO. Ernie read out what the initials stand for and then said without a trace of irony, as God is my witness, "They sound like real important people, so we should trust them." She did not mention their connection to the Common Core which, given the huge amount of CCSS love that was going to be thrown around, seems like an odd oversight. The presenters did show us a graphic reminding us that standards, curriculum, and assessments are tied together like the great circle of life. So there's that.

How does it work?

This turned out to be the Great White Whale of the morning. We watched two videos from the Teacher Channel that showed well-managed dog and pony shows in classrooms. Bert noted that she really liked how the students didn't react to or for the camera. You know how you get that? By having them spend lots of time in front of the cameras, say, rehearsing their stuff over and over.

The first grade class was pretty impressive, but it also only had ten children in it. One of my colleagues asked if the techniques can be used in classes with more than ten students (aka, classes in the real world) and that opened up an interesting side note. The duo noted that the key here is routine and expectations, and that you need to spend the first few weeks of school hammering in your classroom routines so that you could manage more work. One teacher in the crowd noted that this would be easier if all teachers had the same expectations (apparently we were all afraid to use the word "rules") and Ernie allowed as how having set expectations and routines from K through the upper grades would make all of this work much better. "Wouldn't it be lovely?" she said.

Because when you've got a system that doesn't work very well with real, live children, the solution is to regiment the children and put them in lockstep. If the system and the childron don't mesh well-- change the children.

Increasing rigor!

You might have thought this section would come with a definition of that illusive magical quality, but no. We still can't really explain what it is, but we know that we can increase rigor by ramping up content or task or both.

We had some examples, but that brought up another unsolved mystery of the day. "Explain where you live" (DOK 1) ramped its way up to "Explain why your city is better than these other cities" (DOK 3). One of my colleagues observed that this was not only a change in rigor, but a complete change of the task and content at hand. Bert hemmed and hawed and did that little I Will Talk To You Later But For Right Now Let's Agree To Ignore Your Point dance, and no answer ever appeared.

So if you are designing a lesson, "List the names of the planets" might be a DOK 1 question, but a good DOK 3 question for that same lesson might be "Compare and contrast Shakespeare's treatment of female characters in three of his tragedies."

Audience participation

Bert and Ernie lost most of the crowd pretty early on, and by the time we arrived at the audience participation portion (two hours later), the audience seemed to have largely checked out. This would have been an interesting time for them to demonstrate how to handle a class when your plan is bombing and your class is disengaged and checked out, but they went with Pretending Everything Is Going Swell.

The audience participation section highlighted just how squishy Depth of Knowledge is. Bert and Ernie consigned all vocabulary-related activities to Level 1, because "you know the definition or you don't." That's fairly representative of how test creators seem to think, but it is such a stunted version of language use, the mind reels. Yes, words have definitions. But there's a reason that centuries of poetry and song lyric that all basically mean, "I would like to have the sex with you," have impressed women far more than simply saying "I would like to have the sex with you."

There's a lot of this in DOK, a lot of just blithely saying, "Well, this is what was going on in the person's brain when they did this, so this is the level we'll assign this task."

DOK's big weakness

DOK is not total crap. There are some ideas in there that can lead to some useful thinking about thinking. And if you set it side by side with the venerable Bloom's, it can get your brain working in the same way that Bloom's used to.

But like all test prep activities, DOK does not set out to teach students any useful habits of mind. It is not intended to educate; it is intended to train students to respond to certain sorts of tasks in a particular manner. This is not about education and learning; this is about training and compliance. It's a useful window into the minds of the people who are writing test items for the Big Test, if you're concerned about your students' test scores. If you're interested in education, this may not be the best use of your morning.

Thursday, August 21, 2014

Duncan Tries To Hear Teachers

US Secretary of Education Arne Duncan is here with some back-to-school blogging to assure folks that he is totes listening to somebody. His back-to-school conversation comes with two messages.

First, he wants to send out a big thank you to all the folks who helped create some super-duper data points last year-- specifically, the high school graduation rate and the college enrollment rate. I might be inclined to wonder about A) the reality behind those juicy stats and B) what it actually means. But Arne knows what it means:

These achievements are also indications of deeper, more successful relationships with our students. All of us who’ve worked with young people know how much they yearn for adults to care about them and know them as individuals.

Reading Duncan's words always induces an odd sort of vertiginous disorientation as one tries to take in the huge measured-in-light-years distance between the things he says and the policies he pursues. What in the four requirements of Race to the Top would possibly indicate that Duncan's administration is pursuing policies that develop these kind of relationships or satisfy these alleged yearnings? Is it the way teachers fates have a federally mandated dependency on student test scores? Is it the sweet embrace of one-size-fits-all national standards? Maybe it's the grueling program of punishing tests.

Which brings us to the second message.

Duncan says he's been having many many conversations with teachers, "often led by Teacher and Principal Ambassador Fellows" (those teachy folks who have been carefully vetted and selected by the DOE, so you know they're a real collection of widely varied viewpoints). And in those conversations, he's picked a little something something about standardized testing. Which he still thinks is basically swell.

Assessment of student progress has a fundamental place in teaching and learning – few question that teachers, schools and parents need to know what progress students are making.

Also, a bicycle, because a vest has no sleeves. Sure, classroom assessment is important. But recognizing that importance has nothing at all to do with making a case for standardized testing, particularly of the current brand. "Medicine is important" is true, but it's no justification for jamming aspirin into somebody's compound fracture.

Anyway, Arne has picked up three specific concerns:

It doesn’t make sense to hold them [educators] accountable during this transition year for results on the new assessments – a test many of them have not seen before – and as many are coming up to speed with new standards.
The standardized tests they have today focus too much on basic skills, not enough on critical thinking and deeper learning.
Testing – and test preparation – takes up too much time.

Duncan is shocked-- shocked!!-- that anyone would think it's a good idea to make a high stakes test the measure of student achievement or teacher effectiveness. "Growth is what matters. No teacher or school should be judged on any one test, or tests alone –" And here comes the vertiginous woozies (dibs on this as a band name) again, because that would be a heartening quote if it did not come from the very same office which decreed that by order of the federal government high stakes tests must be used as a measure of student achievement and teacher effectiveness. Duncan is talking about this test-based evaluation of students and teachers as if it just spontaneously occurred, like some sort of weird virus suddenly passed around at state ed department sleepover camp, and not a rule that Duncan's office demanded everyone follow. Has Duncan forgotten that he just made the entire state of Washington declare itself a Failing School Disaster Zone precisely because they refused to use high stakes tests as a measure of student achievement and teacher effectiveness?

No test will ever measure what a student is, or can be. It’s simply one measure of one kind of progress. Yet in too many places, testing itself has become a distraction from the work it is meant to support.

You know what one might conclude from that? One might conclude that the testing is a doing an ever-so-crappy job of supporting "the work it is meant to support."

States will have the opportunity to request a delay in when test results matter for teacher evaluation during this transition. As we always have, we’ll work with them in a spirit of flexibility to develop a plan that works...

I would like to check with someone from Washington to see what it feels to be flailed with that spirit of flexibility. But Duncan is opening the door to states postponing the most painful consequences of testing for one year, because, you know, teachers' voices.

Anthony Cody has correctly pointed out that one other voice has spoken up in favor of this-- the voice of Bill Gates. Unfortunately, we'll never know for certain how this all played out. Did Duncan decide to obey the Call of Gates and try to use it to mollify teachers? Is the Voice of Gates so powerful that it blasted the wax from Arne's ears and he could hear teachers finally? Is he bending to political realities, or trying to do damage control.

I have a question I'm more interested in-- what difference will a year make?

Duncan seems to think that some time will improve the tests themselves.

Many educators, and parents, have made clear that they’re supportive of assessment that measures what matters – but that a lot of tests today don’t do that – they focus too much on basic skills rather than problem solving and critical thinking. That’s why we’ve committed a third of a billion dollars to two consortia of states working to create new assessments that get beyond the bubble test, and do a better job of measuring critical thinking and writing.

Never going to happen. National standardized test means test that can be quickly checked and graded at large scale and low cost (or else the testmakers can't profit from it). The college board has had decades to refine their craft, and their refined craft looks like-- a bubble test.

As far as Duncan's other concerns go-- a year will not matter. Much of what he decries is the direct result of making the stakes of these tests extremely high. Student success, teacher careers, school existence all ride on The Test. As long as they do, it is absurd to imagine that The Test will not dominate the school landscape. And that domination is only made worse by the many VAMtastic faux formulas in circulation.

Too much testing can rob school buildings of joy, and cause unnecessary stress. This issue is a priority for us, and we’ll continue to work throughout the fall on efforts to cut back on over-testing.

Oh, the woozies. Duncan's office needs to do one thing, and one thing only-- remove the huge stakes from The Test. Don't use it to judge students, don't use it to judge teachers, don't use it to judge schools and districts. It's that attachment of huge stakes-- not any innate qualities of The Test itself-- that has created the test-drive joy-sucking school-deadening culture that Duncan both creates and criticizes. If the department doesn't address that, it will not matter whether we wait one year or ten-- the results will be the same.

Pages