The National Center for the Analysis of Longitudinal Data in Education Research (CALDER) decided to take a look at edTPA, the teacher evaluation program of dubious value. CALDER's headline may be welcome to the folks at edTPA, but a quick look under the hood reveals a big bunch of baloney. The paper is informative and useful and pretty thorough, but it's not going to make you feel any better about edTPA.
CALDER is a tentacle of the American Institutes for Research, the folks who brought us the SBA test. The report itself is sponsored by Gates Foundation and "an anonymous foundation," which-- really? Hey there, friend? Have you had any research done on your money-making product? Certainly-- here's some fine research sponsored by anonymous backers. You can totally trust it. CALDER does acknowledge that this is just a working paper, and "working papers have
not undergone final formal review."
What are we talking about?
edTPA is a system meant to up the game of teacher entrance obstacles like the much-unloved PRAXIS exams. Coming up with a teacher gatekeeper task than the PRAXIS is about as hard as coming up with a more pleasant organization that the Spanish Inquisition. In this case, there's no reason to assume that "better" is the same as "good."
The idea of coming up with something kind of like the process of becoming a board certified teacher is appealing, but edTPA has been roundly criticized (more than once) for reducing the process of learning the art and science of teaching to a series of hoop-jumping paper-shuffling, an expensive exercise that involves being judged via video clips. The whole business is eminently game-able, and there are already companies out there to help you jump hoops. It's also a system that insults college ed departments by assuming a premise that your college ed program, your professors, your co-operating teacher, and basically none of the people who work with you and give you a grade-- none of those people can be trusted to determine whether or not you should be a teacher. Only some bunch of unknown evaluators hired by Pearson (yeah, they're in on this, too) can decide if you should have a career or not
Yeah, whine whine whine-- but is it any good?
As always with education research, we deploy the program first, and then we try to find out if it's any good. So although edTPA has been around for a bit, here comes the CALDER working paper to decide if we just wasted the time and money of as bunch of aspiring teachers. Or as the paper puts it,
Given the rapid policy diffusion of the edTPA, a performance-based, subject-specific assessment of teacher candidates, it is surprising that there is currently no existing large-scale research linking it to outcomes for inservice teachers and their students.
Well, I've read this paper so you don't have to. Let's take it a chapter at a time
1: Background: The Teacher Education Accountability Movement
Hey, remember back in 2009 when Ed Secretary Arne Duncan said that "many if not most" of the nation's teacher education programs were mediocre? This paper does. Man, it's hard to believe that we didn't believe him when he talked about how much he respected teachers.
Want a bigger red flag about this report? Three footnotes in and we're citing the National Council on Teacher Quality, the least serious faux research group in the education field (insert here my reminder that these guys evaluate non-existent programs and evaluate other programs by reading commencement programs). Straight-faced head nod as well to policy initiatives to measure teacher education programs by measuring value added or subsequent employment history.
Fun fact. 600 ed programs in 40 states now use edTPA. Seven states require edTPA for licensure. Okay, not a "fun" fact so much as a discouraging one.
But let's talk about "theories of action" form how edTPA would actually improve the profession.
First, it could be used to weed out the chaff and keep them unable "to participate in the labor market." That, CALDER wryly notes, would require "predictive validity around the cut point adopted." At least I think they were being wry.
Second, it might affect "candidate teaching practices." edTPA's own people suggest as much. This training to the test could be done independently by individual proto-teachers, or enforced by ed programs.
Third, schools could use edTPA scores as deciding factors in hiring.
CALDER notes that these three methods would only improve the teacher pool if edTPA scores have anything on God's green earth to do with how well the candidate can actually teach.
CALDER proposes to go looking for that very same white whale of data revelation. They poked through a bunch of longitudinal data from Washington State, looking for a correlation with employment (did the candidate get a job) and effectiveness (sigh... student test scores.
Insert Rant Here
Student test scores are not a measure of teacher effectiveness. Student test scores are not a measure of teacher effectiveness. Student. Test. Scores. Are. Not. A. Measure. Of. Teacher. Effectiveness.
Ask parents what they want from their child's teacher. Ask those parents what they mean when they call someone a "good" teacher. They will not say, "Has students get good scores on the Big Standardized Tests." Ask any parent what they mean when they say, "I want my child to get a good education" and they will not reply, "Well, I want my kid to be good at taking standardized tests." Ask taxpayers what they expect to get for their school tax dollars and they will not say, "I pay taxes so that kids will be good at taking standardized tests."
I will push on through the rest of this paper, but this point alone invalidates any findings presented, because their measure of effective teaching is junk. It's like measuring the health of the rain forest by collecting chimpanzee toe nail clippings. It's like evaluating a restaurant by measuring the color spectrum ranges on its menus.
2. Assessment of Prospective Teachers and the Role of edTPA
CALDER is correct to say that edTPA is different from "traditional question-and-answer licensure tests," though as someone who earned his teacher stripes in 1979, I'm inclined to question just how "traditional" the Q&A tests are. The paper follows this up with a history of edTPA, and offers a good brief explanation of how it works:
The edTPA relies on the scoring of teacher candidates who are videotaped while teaching three to five lessons from an instructional unit to one class of students, along with assessments of teacher lesson plans, student work samples and evidence of student learning, and reflective commentaries by the candidate.
CALDER also explains what basis we have for imagining this system might work-- some other system:
Claims about the predictive validity of the edTPA are primarily based on small-scale pilot studies of the edTPA’s precursor, the Performance Assessment for California Teachers (PACT). Specifically, Newton (2010) finds positive correlations between PACT scores and future value-added for a group of 14 teacher candidates, while Darling-Hammond et al. (2013) use a sample of 52 mathematics teachers and 53 reading teachers and find that a one-standard deviation increase in PACT scores is associated with a .03 standard deviation increase in student achievement in either subject.
So, a tiny sample size on a similar-ish system.
CALDER also considers the question of whether edTPA can be used both to evaluate teacher practice and decide whether or not someone should get a teaching license (and if you find it weird that this conversation about making it harder to become a teacher is going on in the same world where we'll let you become a teacher with just five weeks of Teach for America training, join the club).
3. Data and Analytic Approach
CALDER's data come from the 2,362 Washington state teacher candidates who took edTPA in 2013-2014, the first year after the pilot year for edTPA in Washington. That whittles down to 1,424 teachers who actually landed jobs, which in turn whittled down to 277 grade 4-8 reading or math teachers.
Then they threw in student test scores on math and reading tests from the Measures of Student Progress Tests of 2012-2013 and 2013-2014, plus SBA testing from 2014-2015. They "standardized" these scores and "connected" them to demographic data. And then, cruncheroonies.
There follows a bunch of math that is far beyond my capabilities, with several equations supposedly providing a mathy mirror of the various theories of action. Plus mathy corrections for students who didn't test in the previous year, and by the way, what about sample selection bias. In particular, might not teachers be hired based on qualities not measured by edTPA but still related to student test scores. Oh, and they hear that VAM might have some issues, too, though they consulted Chetty's work, so once again, I'm unimpressed.
I am not a statistical analysis guy, and I don't play one on tv, but this model is beginning to look like it could be enhanced by twelve-sided dice and a pair of toads sacrificed under a full moon.
On the one hand, for all the reasons listed above, I'm not very excited about the results, whatever they seem to be. But we've come this far, so why not take it home. Here are some of the things they discovered.
Shockingly, it turns out that non-white, non-wealthy students don't do as well on standardizes tests. So there's that.
Passing edTPA correlates to having a teaching job the following year. So.... people who are good at navigating the hoop-jumping and form-filling and resume-building of edTPA are also good at getting a job?
As for screening, edTPA results correlate maybe with better reading test scores for students, but they don't seem to have diddly to do with math scores. Well, actually, we're graphing them against reading value added and math value added scores, and while the math chart looks like a random spray, the reading chart looks like a group of scores that are kind of bell-curved shaped, and which do not rise as edTPA scores rise. There's a dip on the left side, suggesting that teachers with low edTPA scores correlate to students with low value-added reading.
So I'm not really sure what this is meant to show, and that's probably just me, but it sure doesn't scream, "Good edTPA scores produce good student reading VAM.
And there's more about sample bias and selection and size and it's all kind of messy and vague and ends with a sentence that just shouldn't be here:
Despite the fact that licensure tests appear correlated with productivity, the direct evidence of their efficacy as a workforce improvement tool is more mixed.
"Workforce improvement" and "productivity" belong in discussions of the toaster manufacture industry, not in a serious discussion of the teaching profession.
5. Policy Implications
So teachers who fail the edTPA are less likely to raise student reading test scores. Despite the low sample, the researchers think they're onto something here. They think that edTPA should be used to screen out low performing reading teachers, though it would come at the cost of screening out some candidates who would have been effective teachers. But again-- I find the math suspicious.
With random distribution, 20% of the lowest quartile of edTPA failers would fall in the lowest quartile of student resuilts. But "we find that 46% of reading teachers who fail the edTPA are in the low-performing category." So half of the edTPA failers get low student results-- and half don't. But-- and check my eyeball work here-- it looks like roughly half of all the edTPAers failed to "add any value" to students. Mind you, I think the whole busines of trying to use student VAM to measure teacher effectiveness is absolute bunk-- but even by their own rules, I'm not seeing anything here to write home about.
I applaud CALDER's restraint:
Given that this is the first predictive validity study of the edTPA, and given the nuanced findings we describe above, we are hesitant to draw broad conclusions about the extent to which edTPA implementation will improve the quality of the teacher workforce.
My emphasis because-- really? Forty states and this is the first validity study?
Anyway, they point out (as they did at the top) that all of the theories of action would be more compelling and convincing if anyone knew whether or not edTPA is actually assessing anything real. And the writers suggest a few steps to take moving forward.
Those steps include reweighting and revising the rubrics, comparing edTPA to other evauation methods (eg observations), checking the edTPA impact on minority teacher candidates, and looking at whether or not edTPA varies across different student teaching situations. And underlying all that-- the question of whether all of this is really worth the time and money that it costs.
These are all very worthwhile questions to consider, and I give CALDER credit for bringing them up.
And let me circle back around to my own conclusion. You will not find me pledging undying loyalty to many teacher education programs out there, because there certainly are some terrible ones. And you will not find me sticking up for PRAXIS (I type it in caps because I imagine the sound of a chain-smoking cat coughing up a hairball), which is a terrible way to decide whether someone is fit to be a teacher or not.
But edTPA's long and torturous parentage and history, as well as its insistence on generating revenue and putting candidates' fates in far-off impersonal hands-- well, it's just not a great candidate to assume the Teacher Evaluation Crown. It takes human beings to teach human beings how to teach human beings, not a complicated hoop-jumping paperwork festival. We know how to do this well, but it's not cheap and it's not easy and it won't make anybody rich. We can do the kind of "better" that is actually "good."