It's Guest Post day here, and my guest is William Bryant. Bryant is currently an edupreneur with a company focused on helping students get ready for college, but he spent a decade working in test development for the folks at ACT. He has some interesting insights to offer about why tests end up the way they do; important to understand not just because of the tests themselves, but because of the testing effect on curriculum. Read on.
Why Are Standardized Tests So Boring?: A Sensitive Subject
It’s a guiding principle in educational testing that test questions should not upset test-takers. Much like dinner conversation with in-laws, tests should refrain from referencing religion, or sex, or race, or politics -- anything that might provoke a heightened emotional response that could interfere with students’ ability to give their best effort.
Attention to “sensitivity” concerns, as they’re known, makes good sense conceptually, but in practice such concerns are responsible for much of why the standardized tests kids take in school are so ridiculously bland and unengaging. The drive to avoid potentially sensitive content constrains test developers to such a degree that we might legitimately question whether the cure is at least as bad as the disease.
So determined are test-makers to avoid triggering unwanted emotions, they end up compromising the validity of their tests by excluding essential educational content and restricting students’ opportunities to demonstrate the creative and critical thinking skills they’re actually capable of.
No one knows for certain if the tests are better or worse for being so cautious. There is no research defining sensitivity, no evidence-based catalog of topics to avoid, no study measuring the test-taking effects of “sensitive” content. For all anyone knows, inflaming emotions might actually improve test results -- though few test-makers would risk experimenting to find out.
No test-maker wants to hear from a teacher or parent that a student was stunned, enraged, offended, or even mildly disconcerted by something they encountered on a test. And in fairness, no test-maker wants to subject a test-taking kid to a hurtful or upsetting experience.
Since there is no research to guide decisions on sensitivity, the rules test-makers set for themselves are based strictly on their own judgment, and on some sense of industry practice. Inevitably they default to the most conservative positions possible: if a topic might conceivably be construed as sensitive, that’s enough reason to keep it off the test.
Typically, sensitivity guidelines steer test developers away from content focused on age, disability, gender, race, ethnicity, or sexual orientation. Test-makers also avoid subjects they deem inherently combustible, such as drugs and drinking, death and disease, religion and the occult, sexuality, current politics, race relations, and violence.
A “bias review” process gets applied in the course of developing passages and questions for testing, to weed out anything that might be offensive or unfair to certain subgroups -- typically African Americans, Asian Americans, Latinos, Women, sometimes Native Americans. The test-maker will send prospective test materials out for review by qualified educators who belong to these subgroups. If a reviewer thinks a test item is problematic, it gets tossed. Though this process is better than nothing, it reflects more butt-covering than enlightenment, putting test-maker and reviewer alike in the awkward position of saying, for instance, “These test items are not unfair to black students. How do we know? We had a black person look at them!”
Judgments on topics not pertaining to identity and cultural difference rest purely on the test makers, who are as risk-averse as can be. In one example I’m familiar with, a passage about the mythological Greek figure Eurydice was rejected because the story deals with death and the underworld. Think of all the literature and art excluded from testing on that kind of criteria. Think of the impoverished portrait of human achievement and lived experience conveyed to students by such exclusions.
In another case, a passage on ants was rejected because it reported that males get booted out of the colony and die shortly after mating. I’m still not clear on whether the basis for that judgment centered on the reference to insects mating, insects dying, or the prospect of a student projecting insect gender relations onto human relations and being thereby too disturbed to think clearly. Whatever the case, rejecting such a passage on the basis of sensitivity concerns seems downright anti-science.
I’ve seen a pair of passages from Booker T. Washington and W. E. B. DuBois nixed out of concern for racial sensitivity: you can’t have African Americans arguing with each other on questions of race. Test-makers strive to include people of color in their test content to satisfy requirements for cultural inclusivity. But those people of color cannot be engaged in the experience of being people of color -- which renders the whole impulse toward inclusivity hollow and cynical. Such an over-abundance of caution does more to protect the test-maker than the student.
The validity of educational assessments that cannot reference slavery, evolution, Neanderthals, extreme weather events, natural life cycles, economic inequality, illness, and other such potentially sensitive topics seems severely compromised. More concerning still is the prospect of such tests driving curriculum. With school funding and teacher accountability riding on standardized test scores, teaching to the test makes irresistibly practical sense in many educational contexts. Thus, if the tests avoid great swaths of history, science, and literature, then so will curriculum.
The makers of the standardized tests schoolkids encounter argue that they are not interested in censoring educational content, only in recognizing that when students encounter potentially sensitive topics they need the presence of an adult to guide them through. The classroom and the dinner table are places for negotiating challenging subjects, not the testing environment, where kids are under pressure and on their own.
This rationale should rouse everyone to question why we continue to tolerate such artificial conditions for evaluating student learning. It essentially concedes that either testing will not align with curriculum, or that curriculum will align only with the things test-makers decide are safe enough to put in front of test-taking students. Surely we can recognize in this the severe design flaw that lies at the heart of the testing problem.