Thursday, November 19, 2015

More Evidence That Tests Measure SES

Want more proof, again, some more, of the connection between socio-economic status and standardized test results? Twitter follower Joseph Robertshaw pointed me at a pair of studies by Randy Hoover, PhD, at the Department of Teacher Education, Beeghly College of Education, Youngstown State University.

Hoover is now a professor emeritus, but the validity of standardized testing and the search for a valid and reliable accountability system. He now runs a website called the Teacher Advocate and it's worth a look.

Hoover released two studies-- one in 2000, and one in 2007-- that looked at the validity of the Ohio Achievement Tests and the Ohio Graduate Test, and while there are no surprises here, you can add these to your file of scientific debunking of standardized testing. We're just going to look at the 2007 study, which was in part intended to check on the results of the 2000 study.

The bottom line of the earlier study appears right up front in the first paragraph of the 2007 paper:

The primary finding of this previous study was that student performance on the tests was most significantly (r = 0.80) affected by the non-school variables within the student social-economic living conditions. Indeed, the statistical significance of the predictive power of SES led to the inescapable conclusion that the tests had no academic accountability or validity whatsoever.

The 2007 study wanted to re-examine the findings, check the fairness and validity of the tests, and draw conclusions about what those findings meant to the Ohio School Report Card.

So what did Hoover find? Well, mostly that he was right the first time. He does take the time to offer a short lesson in statistical correlation analysis, which will be helpful if, like me, you are not a research scholar. Basically, the thing to remember is that a perfect correlation is 1.0 (or -1,0). So, getting punched in the nose correlates about 1.0 to feeling pain.

Hoover is out to find the correlation between what he calls the students' "lived experience" to district level performance is 0.78. Which is high.

If you like scatterplot charts (calling Jersey Jazzman), then Hoover has some of those for you, all driving home the same point. For instance, here's one looking at the percent of economically disadvantaged students as a predictor of district performance.




 














That's an r value of -0.75, which means you can do a pretty good job of predicting how a district will do based on how few or many economically disadvantaged students there are.

Hoover crunched together three factors to create what he calls a Lived Experience Index that shows, in fact, a 0.78 r value. Like Chris Tienken, Hoover has shown that we can pretty well assign a school or district a rating based on their demographics and just skip the whole testing business entirely.

Hoover takes things a step further, and reverse-maths the results to a plot of results with his live experience index factored out-- a sort of crude VAM sauce. He has a chart for those results, showing that there are poor schools performing well and rich schools performing poorly. Frankly, I think he's probably on shakier ground here, but it does support his conclusion about the Ohio school accountability system of the time to be "grossly misleading at best and grossly unfair at worst," a system that "perpetuates the political fiction that poor children can't learn and teachers in schools with poor children can't teach."

That was back in 2007, so some of the landscape such as the Ohio school accountability system (well, public school accountability-- Ohio charters are apparently not accountable to anybody) has changed, along with many reformster advances of the past eight years.

But this research does stand as one more data point regarding standardized tests and their ability to measure SES far better than they measure anything else. 

25 comments:

  1. One problem here is that median family income also predicts teacher turnover and teacher experience, things that Hoover does not make any attempt to control for. His results could easily be consistent with those high rates of teacher turnover and low level of experience driving low test scores.

    I am curious to see if people here think that having high turnover and low levels of experience in the classroom have a negative impact on learning?

    ReplyDelete
    Replies
    1. It's your side that supports TfA.

      Delete
    2. Dienne,

      I don't have a side. I think TfA gets much more attention than it deserves because it provides only a tiny number of teachers relative to number of K-12 teachers in the country. I also think that education schools do not do a very good job preparing teachers, especially where "content knowledge" is important like high schools.

      Delete
    3. "I don't have a side." BWAHAHAHA! Thanks for the laugh - I needed one this week.

      Delete
    4. I don't. My kids have always been sent to whichever public school the school board assigned them to. I like choice schools because it gives schools the opportunity to specialize in education, say having public progressive schools for those parents who seek that kind of education.

      Delete
  2. My guess, TE, is that most or all of us think that "high rates of teacher turnover" and "low level of experience" are not good for kids -- this is, for most of us, probably the #1 reason to adamantly oppose VAM. Why would a good smart teacher choose to work in the low SES schools under a VAM system ? They wouldn't. Even someone like me, who basically wants to "do good" would not choose to end their careers prematurely by working in a low SES school.

    ReplyDelete
    Replies
    1. Julie,

      It seems to me that the high turnover rate and relative inexperience of teachers in less affluent school districts long predates any teacher evaluation system involving a VAM system. Do you disagree?

      In any case, it may well be that the low exam scores reflect school districts that have staffing situations that "are not good for kids", not simply the low household income levels in the districts.

      Delete
    2. Because it is harder work! Why wouldn't this be the case ? It IS harder work! Today, a suburban teacher would be less competent, and work much fewer hours, etc. and still get a much higher VAM score....so yes, problem predates VAM, so put VAM in there by all means to greatly accelerate the problem!

      Delete
    3. Julie,

      I agree that teaching, especially teaching well, is hard work. It still might well be the case, however, that low test scores are actually reflecting what is going on in the classroom in low income districts, not just what goes on outside the classroom.

      Delete
    4. Well, the American Statistical Association says that teachers only make up somewhere between 0% and 14% of influence on students' learning.

      Delete
    5. Rebecca,

      I have never understood how people on these blogs can argue that 1) teachers have very minimal impact on student learning and 2) teachers must spend all their classroom time doing test prep.

      If test prep has no impact, why do it?

      Delete
    6. You're the statistician guy. I'm just quoting what the statisticians' professional association says.

      Delete
  3. Hoover and now Greene miss the point. And given some of their familiarity with basic statistics, the miss must be intentional. Of course, SES correlates with standardized test scores. But that's not the point. SES is not a dependent variable but rather a metric within which to look at the underlying data. It's like saying that the price of a car correlates with horsepower. That is probably true as well, though again, it's not relevant since it's unlikely that someone shopping for a $25,000 car is also shopping for one that costs $100,000.

    Instead, the more relevant question is what is the range of horsepower for cars that cost under $25,000. And can we learn anything from this ? Similarly, what is the range of test scores from those in the bottom quintile of SES ? And again, what can we learn from this ? In fact, there is an entire branch of statistics which works for this kind of analysis called Bayesian statistics.

    Greene and Hoover likely don't want to look at things this way because they would see the same types of things that CREDO and Mathematica have found. If you look within the cohort of urban students (as an example conditioned on geography rather than income), you find that charter schools routinely outperform traditional schools. See links below.

    Again, just like the person buying the $25,000 car is not buying a $100,000 car, the poor family in the Bronx is choosing between a failing traditional school and a charter school rather than an affluent school in Greenwich.

    Stanford University's CREDO - "Across the 41 cities studied, students in charter schools learned significantly more than their peers attending traditional public schools – 40 more days worth of learning in math, and 28 more in reading."http://www.usnews.com/opinion/knowledge-bank/2015/03/19/new-study-shows-charter-schools-making-a-difference-in-cities


    Mathematica - "In our exploratory analysis, for example, we found that study charter schools serving more low income or low achieving students had statistically significant positive effects on math test scores"http://www.mathematica-mpr.com/~/media/publications/PDFs/education/charter_school_impacts.pdf

    ReplyDelete
  4. Hoover and now Greene miss the point. And given some of their familiarity with basic statistics, the miss must be intentional. Of course, SES correlates with standardized test scores. But that's not the point. SES is not a dependent variable but rather a metric within which to look at the underlying data. It's like saying that the price of a car correlates with horsepower. That is probably true as well, though again, it's not relevant since it's unlikely that someone shopping for a $25,000 car is also shopping for one that costs $100,000.

    Instead, the more relevant question is what is the range of horsepower for cars that cost under $25,000. And can we learn anything from this ? Similarly, what is the range of test scores from those in the bottom quintile of SES ? And again, what can we learn from this ? In fact, there is an entire branch of statistics which works for this kind of analysis called Bayesian statistics.

    Greene and Hoover likely don't want to look at things this way because they would see the same types of things that CREDO and Mathematica have found. If you look within the cohort of urban students (as an example conditioned on geography rather than income), you find that charter schools routinely outperform traditional schools. See links below.

    Again, just like the person buying the $25,000 car is not buying a $100,000 car, the poor family in the Bronx is choosing between a failing traditional school and a charter school rather than an affluent school in Greenwich.

    Stanford University's CREDO - "Across the 41 cities studied, students in charter schools learned significantly more than their peers attending traditional public schools – 40 more days worth of learning in math, and 28 more in reading."http://www.usnews.com/opinion/knowledge-bank/2015/03/19/new-study-shows-charter-schools-making-a-difference-in-cities


    Mathematica - "In our exploratory analysis, for example, we found that study charter schools serving more low income or low achieving students had statistically significant positive effects on math test scores"http://www.mathematica-mpr.com/~/media/publications/PDFs/education/charter_school_impacts.pdf

    ReplyDelete
    Replies
    1. But what, exactly, does "28 more days worth of learning" look like? How does one quantify that? And even if you could, is it statistically significant? More importantly, though: are test scores the only measure of a school's success?

      Delete
    2. Days of learning is a metric. Consider the reading level of a student when he starts the year and when he ends the year. The difference is what he has learned during the year. And CREDO is saying that this increase is proportionally more extensive in charters. That's all. Statistical significance isn't relevant here since CREDO is measuring the population not a sample. In other words, the average height of the children in a school is the average height. It is not subject to error. However, if you are trying to estimate the average height of the school by looking at one classroom, there may be some error.

      Are test scores the only measure of a school's success ? Of course not. But the fact that there are other measures in no way invalidates their use. When you go to the doctor, there are various objective measures such as blood pressure, body temperature, etc. Such measurements can be compared across doctors, hospitals or any such subset. It certainly doesn't tell the whole story since a doctor does far more than take your temperature. But it does provide one standardized measure.

      Delete
    3. And doctors are not going to make a diagnosis on the basis of one standardized measure. So if a doctor said you had cancer based on your blood pressure, that would be an invalid use of measurement. That's what we're talking about.

      Delete
    4. You are basically talking about sampling error as a result of dispersion across students and across time. Derek Briggs tries to address both of these in the article below. You are already measuring the effect of that teacher across many students. And as far as time, Briggs cites better results from measuring multi-year VAM (e.g. 2 or 3 yrs) rather than just 1 yr. In other words, I'm not saying that the measuremetn process can't be improved. And I'm also not saying that there are things which are important which are more difficult to measure. But none of that contradicts the idea of measuring results as well as we can ... and using those metrics to improve education ... and part of that is rewarding good teachers and helping (and possibly terminating) bad teachers.

      Here's the link:
      http://www.colorado.edu/education/sites/default/files/attached-files/Briggs_VAM%20Inferences_101511.pdf

      Delete
  5. For the sake of argument, let's concede that charters get kids to score higher on tests. Two questions about this: 1. So what? Is scoring higher on standardized tests proven to lead to anything important? 2. Many charters get those higher scores by insisting on militaristic discipline and narrowing the curriculum. Do we want kids to be taught blind obedience in a limited curriculum just to score high on a test?

    ReplyDelete
    Replies
    1. Mike,

      The next post on the blog points out that scoring in the top quintile on standardized math tests are associated with an 8 fold increase in the chances of a relatively poor student graduating from college. I think that is something important. Do you think the increase in graduation rates unimportant?

      Delete
    2. First, thank you for your honesty in the (sort of) concession. Let me take a shot at your questions:

      1. Tests - The data suggests that there is a strong connection between higher test scores and high school graduation as well as college matriculation. So yes, raising test scores matter since they indicate kids are learning something and are more likely to continue with their education. And getting an education is one of the key ways to increase your income and (for many kids) end the cycle of poverty. So yes, I'd say that's important.

      2. Discipline - Again, you are right. Many charters (like Success) do insist on a high level of discipline. And some disagree with this. But you miss the point about charters. The discipline isn't mandatory. Almost all charters are schools of choice. At this point, I think it's unlikely that parents are being "fooled" into the kind of school they are sending their kids to. But if parents want to send their kids to get more discipline (particularly when many of them may not be getting this element at home), who are you to tell those families that they should not have that choice ??? Those families aren't telling you what you should do with your children and you ought to likewise respect their choice.

      Delete
    3. Alan - TE isn't "conceding" anything. He agrees with you. Though he denies taking "sides", he's on yours.

      Delete
    4. Dienne,

      There is only one level of indent here. I think Alan was responding to Mike's post, not mine.

      Delete
    5. Yep. I was replying to Mike not TE. I agree with TE. @ TE, if you could provide the link to the data showing increased graduation for top performers on standardized tests, that would be appreciated.

      Delete
    6. Alan,

      It is in a graphic in the post just above this one on this blog.

      Delete