Skip to main content

Janresseger: New Research Yet Again Proves the Folly of Judging Teachers by Their Students’ Test Scores

The Obama Administration’s public education policy, administered by Secretary of Education Arne Duncan, was deeply flawed by its dependence on technocracy. In the 1990s, Congress had been wooed by researchers who had developed the capacity to produce giant, computer-generated data sets. What fell out of style in school evaluations were personal classroom observations by administrators who were more likely to notice the human connections that teachers and children depended on for building trusting relationships to foster learning.

Technocratic policy became law in 2002, when President George W. Bush signed the omnibus No Child Left Behind Act. Technocratic policy reached its apogee in 2009 as Arne Duncan’s Race to the Top grant program became a centerpiece of the federal stimulus bill passed by Congress to ameliorate the 2008 Great Recession.

In an important 2014 article, the late Mike Rose, a professor of education, challenged the dominant technocratic ideology.  He believed that excellent teaching cannot be measured by the number of correct answers any teacher’s students mark on a standardized test. Rose reports: The “classrooms (of excellent teachers) were safe. They provided physical safety…. but there was also safety from insult and diminishment…. Intimately related to safety is respect…. Talking about safety and respect leads to a consideration of authority…. A teacher’s authority came not just with age or with the role, but from multiple sources—knowing the subject, appreciating students’ backgrounds, and providing a safe and respectful space. And even in traditionally run classrooms, authority was distributed…. These classrooms, then, were places of expectation and responsibility…. Overall the students I talked to, from primary-grade children to graduating seniors, had the sense that their teachers had their best interests at heart and their classrooms were good places to be.”

In her 2012 book, Reign of Error, Diane Ravitch reviews the technocratic strategy of Arne Duncan’s Race to the Top. To qualify for a federal grant under this program, states had to promise to evaluate public school teachers by the standardized test scores of their students: “Unfortunately, President Obama’s Race to the Top adopted the same test-based accountability as No Child Left Behind. The two programs differed in one important respect: where NCLB held schools accountable for low scores, Race to the Top held both schools and teachers accountable. States were encouraged to create data systems to link the test scores of individual students to individual teachers. If the students’ scores went up, the teacher was an ‘effective’ teacher; if the students’ scores did not go up, the teacher was an ‘ineffective’ teacher  If schools persistently had low scores, the school was a ‘failing’ school, and its staff should be punished.” (Reign of Error, p. 99).

Ravitch reminds readers of a core principle: “The cardinal rule of psychometrics is this: a test should be used only for the purpose for which it is designed. The tests are designed to measure student performance in comparison to a norm; they are not designed to measure teacher quality or teacher ‘performance.'” (Reign of Error, p. 111)

This week, Education Week‘s Madeline Will covers major new longitudinal research documenting what we already knew: that holding teachers accountable for raising their students’ test scores neither improved teaching nor promoted students’ learning:

“Nationally, teacher evaluation reforms over the past decade had no impact on student test scores or educational attainment. ‘There was a tremendous amount of time and billions of dollars invested in putting these systems into place and they didn’t have the positive effects reformers were hoping for.’ said Joshua Bleiberg, an author of the study and a postdoctoral research associate at the Annenberg Institute for School Reform at Brown University… A team of researchers from Brown and Michigan State Universities and the Universities of Connecticut and North Carolina at Chapel Hill analyzed the timing of states’ adoption of the reforms alongside district-level student achievement data from 2009 to 2018 on standardized math and English/language arts test scores. They also analyzed the impact of the reforms on longer-term student outcomes including high school graduation and college enrollment. The researchers controlled for the adoption of other teacher accountability measures and reform efforts taking place around the same time, and found that their results remained unchanged. They found no evidence that, on average, the reforms had even a small positive effect on student achievement or educational attainment.”

Arne Duncan is no longer the U.S. Secretary of Education. And in 2015, Congress replaced the No Child Left Behind Act with a different federal education law, the Every Student Succeeds Act (ESSA), in which Congress permitted states more latitude in how they evaluate schoolteachers. So why is this new 2021 research so urgently important?  Madeline Will reports, “Evaluation reform has already changed course. States overhauled their teacher-evaluation systems quickly, and many reversed course within just a few years.”  Will adds, however, that in 2019,  34 states were still requiring “student-growth data in teacher evaluations.”

In 2019, for the Phi Delta KappanKevin Close, Audrey Amrein-Beardsley, and Clarin Collins surveyed teacher evaluation systems across the states.  Many states still evaluate teachers according to how much each teacher adds to a student’s learning as measured by test scores, a statistic called the Value-Added Measure (VAM).  Practices across the states are slowly evolving: “While the legacy of VAMs as the ‘objective’ student growth measure remains in place to some degree, the definition of student growth in policy and practice is also changing. Before ESSA, student growth in terms of policy was synonymous with students’ year-to-year changes in performance on large-scale standardized tests (i.e., VAMs). Now, more states are using student learning objectives (SLOs) as alternative or sole ways to measure growth in student learning or teachers’ impact on growth. SLOs are defined as objectives set by teachers, sometimes in conjunction with teachers’ supervisors and/or students, to measure students’ growth. While SLOs can include one or more traditional assessments (e.g., statewide standardized tests), they can also include nontraditional assessments (e.g., district benchmarks, school-based assessments, teacher and classroom-based measures) to assess growth. Indeed, 55% (28 of 51) of states now report using or encouraging SLOs as part of their teacher evaluation systems, to some degree instead of VAMs.”

The Every Student Succeeds Act eased federal pressure on states to evaluate teachers by their students’ scores, but five years since its passage, remnants of these policies linger in the laws of many states.  Once bad policy based on technocratic ideology has become embedded in state law, it may not be so easy to change course.

In a profound book, The Testing Charade: Pretending to Make Schools Better, the Harvard University psychometrician, Daniel Koretz explains succinctly why students’ test scores cannot possibly separate “successful” from “failing” schools and why students’ test scores are an inaccurate and unfair standard for evaluating teachers:

“One aspect of the great inequity of the American educational system is that disadvantaged kids tend to be clustered in the same schools. The causes are complex, but the result is simple: some schools have far lower average scores…. Therefore, if one requires that all students must hit the proficient target by a certain date, these low-scoring schools will face far more demanding targets for gains than other schools do. This was not an accidental byproduct of the notion that ‘all children can learn to a high level.’ It was a deliberate and prominent part of many of the test-based accountability reforms…. Unfortunately… it seems that no one asked for evidence that these ambitious targets for gains were realistic. The specific targets were often an automatic consequence of where the Proficient standard was placed and the length of time schools were given to bring all students to that standard, which are both arbitrary.” (The Testing Charade, pp. 129-130)

This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:

The views expressed by the blogger are not necessarily those of NEPC.

Jan Resseger

Before retiring, Jan Resseger staffed advocacy and programming to support public education justice in the national setting of the United Church of Christ—working ...