NEPC Resources on Classroom Teaching and Learning
The Conflict Over Parents’ Rights
Teachers Welcome Deprofessionalization
Five Myths About Teaching
Reviews Worth Sharing: The Effectiveness of Secondary Math Teachers from Teach for America and the Teaching Fellows Programs (Institute of Education Sciences, September 2013)
This review offers a critique of a teacher effectiveness experiment conducted by investigators from the Mathematica Policy Group and published by the Institute of Education Sciences. The Mathematica experiment was designed to provide evidence about the effectiveness of teachers who were themselves high-achieving students and trained by either Teach for America (TFA) or the Teaching Fellows programs.
NEPC Review: The Opportunity Myth (TNTP, September 2018)
A TNTP report aims to expose what it labels the “opportunity myth” in American education: that while schools purport to prepare students well, they don’t deliver. It paints a dramatic picture of American students being misled by false promises of opportunity, when they could make significant learning gains if they experienced grade-level content, strong instruction, deep engagement, and high expectations. The report contends that these negative experiences are primarily the result of educators’ daily decisions and are magnified for students of color and low-income students. Though the report presents an array of qualitative and quantitative data, some of its particular claims are not fully supported by evidence, and it is unclear how key constructs are measured. Importantly, in describing educators’ decisions, the report does not sufficiently account for larger systemic and societal impediments to opportunity that serve to establish and maintain many of the obstacles and problematic patterns the report observes.
Review of Continued Progress: Promising Evidence on Personalized Learning
An evaluation report from RAND focused on school-wide initiatives funded by the Bill & Melinda Gates Foundation to promote teaching approaches touted as personalized learning. These reforms generally rely on digital technology and encompass a range of strategies, such as developing learner profiles with individualized goals, and using data to provide personalized learning paths in which students have choice, get individualized support, and engage in learning outside school. The research, which includes many high-quality elements, suggests that some of the studied approaches are associated with higher scores on a common assessment (the MAP). Broad conclusions about the efficacy of technology-based personalized learning, however, are not warranted by the research. Limitations include a sample of treatment schools that is unrepresentative of the general population of schools, the lack of a threshold in the study for what qualified as implementing “personalized learning” in the treatment schools, and the reality that disruptive strategies such as competency-based progression, which require the largest departures from current practice, were rarely implemented in the studied schools.
NEPC Review: The Hidden Value of Curriculum Reform: Do States and Districts Receive the Most Bang for Their Curriculum Buck? (Center for American Progress, October 2015)
A recent Center for American Progress report, The Hidden Value of Curriculum Reform, draws bold conclusions about the high payoff of better textbooks. It finds that textbooks are rarely chosen based on evidence of effectiveness and true alignment with standards. It also finds that elementary mathematics textbook prices vary little, regardless of quality or whether or not a state recommends particular texts for adoption. While these elements of the report have merit, it then overreaches. Based on a single prior study of 1st and 2nd-grade math curricula in some high-poverty schools, it draws general conclusions about the Return on Investment (ROI) for good versus weak textbooks, ignoring key findings within the original study and also ignoring other research showing that curricular effects vary, depending on context and implementation—that is, a good book is no guarantee of benefits. The report then compares its estimated ROI for textbooks with another study’s reported ROI for other interventions, while ignoring nuances of that second study’s calculations. Overall, Hidden Value is timely and provides important insights about the need for evidence-based curriculum selection. However, its highly optimistic claims about curricular ROI could reduce support for other worthy interventions and spur simplistic comparisons of textbooks that lead to naïve conclusions about which curriculum is “best.”
Does Class Size Matter?
Review of Fixing Classroom Observations: How Common Core Will Change the Way We Look at Teaching
Fixing Classroom Observations: How Common Core Will Change the Way We Look at Teaching is an advocacy document. It asserts that current classroom observation rubrics are not aligned with Common Core standards and have too many cumbersome criteria; thus, observers are overloaded, give too many high ratings, and seldom give productive feedback. To remedy these problems, the report proposes two “must-have” changes to observation rubrics: (1) pay more attention to lesson content; and (2) pare observation rubrics down to make them more focused and clear. These “must haves” may or may not address some problems of classroom observations, but there is good reason to conclude that they won’t provide much benefit. The report includes no research-informed argument to support its claim that new observation rubrics improve implementation of new teacher evaluation systems by fixing inadequate observer training, insufficient monitoring of rater calibration, and lack of time or low skills in providing instructional feedback. Tools that help observers focus on lesson content may guide substantive improvements, but the report does not offer a strong rationale for doing so. Streamlined instruments and curriculum orientation may also hold some promise, but are unlikely to seriously address core problems surrounding teacher evaluations.
Review of An Opportunity Culture for All: Making Teaching a Highly Paid, High-Impact Profession
This report from a think tank called Public Impact begins with two unsupported premises: that only one in four teachers is good enough to help close achievement gaps, and that current efforts to recruit and retain excellent teachers are inadequate. To allow existing excellent teachers to reach more students and to develop excellence in their colleagues, it proposes a model for restructuring teaching. Hierarchically arranged teaching teams would rely on fewer teachers but more paraprofessionals, more digital instruction, longer work hours, and some larger classes. Teacher salaries would increase. However, while the report targets teacher excellence, it offers no specific means of identifying and assessing that quality. In addition, the report does not take into account relevant research literature in key areas, including teacher assessment, multiple influences on student achievement, digital instruction, teacher burnout, and teacher attrition. Overall, the proposal is based on unsupported assumptions, assertions and projections—wishes and beliefs that if the approach were put into practice, it would somehow play out to the benefit of students. Lacking an empirical base, the report is not a useful guide for policy.
Review of Does Sorting Students Improve Test Scores?
This National Bureau of Economic Research working paper purports to examine the extent and effects of sorting students into classrooms by test scores. It then claims to explore the effect of sorting on overall student achievement as well as on low achievers, high achievers, gifted, special education and Limited English Proficient students. The paper uses standardized Texas state test scores as the measure of learning growth. Based on a comparison between third- and fourth-grade scores, the paper concludes that sorting students by scores is associated with significant learning gains for both lower and higher achievers. It does not, however, find similar effects for the sub-groups. The paper is limited by several important methodological issues. First, it simply assumes, based on test score distributions, that the schools tracked students between classes—and this assumption is highly questionable. Second, it provides no criteria by which students were classified as high or low achievers. Finally, it measures only relative standing of students on two proficiency tests given in different years. It does not measure growth. Because of these and other weaknesses, this paper should not be used to inform policy regarding tracking or grouping practices.
Review of Gathering Feedback for Teaching: Combining High-Quality Observation with Student Surveys and Achievement Gains
This second report from the Measures of Effective Teaching (MET) project offers ground-breaking descriptive information regarding the use of classroom observation instruments to measure teacher performance. It finds that observation scores have somewhat low reliabilities and are weakly though positively related to value-added measures. Combining multiple observations can enhance reliabilities, and combining observation scores with student evaluations and test-score information can increase their ability to predict future teacher value-added. By highlighting the variability of classroom observation measures, the report makes an important contribution to research and provides a basis for the further development of observation rubrics as evaluation tools. Although the report raises concerns regarding the validity of classroom observation measures, we question the emphasis on validating observations with test-score gains. Observation scores may pick up different aspects of teacher quality than test-based measures, and it is possible that neither type of measure used in isolation captures a teacher’s contribution to all the useful skills students learn. From this standpoint, the authors’ conclusion that multiple measures of teacher effectiveness are needed appears justifiable. Unfortunately, however, the design calls for random assignment of students to teachers in the final year of data collection, but the classroom observations were apparently conducted prior to randomization, missing a valuable opportunity to assess correlations across measures under relatively bias-free conditions.
NEPC Review: Passing Muster: Evaluating Teacher Evaluation Systems (April 2011)
Baker expresses concern that Passing Muster is an example of “…technicians working within the political arena…deferring judgment on important technical concerns that have real ethical implications.” Although the authors of Passing Muster claim no preference for specific types of evaluation systems, their rating system effectively suggests that value-added measures should be the benchmark for evaluating teacher evaluation systems, simply because they are available and not because they are good measures. Baker calls on the authors to admit their bias: “When a technician knows that one choice is better (or worse) than another, one measure or model better than another, and that these technical choices affect real lives, the technician should – MUST – be up front/honest about these preferences.”
NEPC Review: "Cross-Country Evidence on Teacher Performance" and "Merit Pay International" (February 2011)
The primary claim of this Harvard Program on Education Policy and Governance report and the abridged Education Next version is that nations “that pay teachers on their performance score higher on PISA tests.” After statistically controlling for several variables, the author concludes that nations with some form of merit pay system have, on average, higher reading and math scores on this international test of 15-year-old students. Although the author lists numerous caveats, his broad conclusions do not heed these cautions. The fundamental differences among countries in the types of performance pay system are not properly considered. Nations are simply lumped together as having or not having a performance pay plan. Also, the length of time the program had been in place in each country is not addressed and the unknown intensity of program implementations argue against drawing lessons from this study. The small sample size of 28 observations requires extreme caution in interpretation. For example, the inclusion or exclusion of a single country results in large shifts in the size of the reported relationships. That is, the numbers become unreliable and invalid. With any correlational study, attributing causality is problematic; the differences among nations could be due to any number of factors. Finally, the type of regression-based analyses used to support the performance pay conclusion does not properly consider that the background variables used in these analyses can vary in terms of relationships with student scores and have different definitions across the countries under study. Therefore, drawing policy conclusions about teacher performance pay on the basis of this analysis is not warranted.
Suggested Citation: von Davier, M. (2011). Review of “Cross-Country Evidence on Teacher Performance Pay.” Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/thinktank/review-pisa-performance-pay
Due Diligence and the Evaluation of Teachers
NEPC Review: Learning About Teaching (December 2010)
The Bill & Melinda Gates Foundation’s “Measures of Effective Teaching” (MET) Project seeks to validate the use of a teacher’s estimated “value-added”—computed from the year-on-year test score gains of her students—as a measure of teaching effectiveness. Using data from six school districts, the initial report examines correlations between student survey responses and value-added scores computed both from state tests and from higher-order tests of conceptual understanding. The study finds that the measures are related, but only modestly. The report interprets this as support for the use of value-added as the basis for teacher evaluations. This conclusion is unsupported, as the data in fact indicate that a teachers’ value-added for the state test is not strongly related to her effectiveness in a broader sense. Most notably, value-added for state assessments is correlated 0.5 or less with that for the alternative assessments, meaning that many teachers whose value-added for one test is low are in fact quite effective when judged by the other. As there is every reason to think that the problems with value-added measures apparent in the MET data would be worse in a high-stakes environment, the MET results are sobering about the value of student achievement data as a significant component of teacher evaluations.
Suggested Citation: Rothstein, J. (2011). Review of “Learning About Teaching: Initial Findings from the Measures of Effective Teaching Project.” Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/thinktank/review-learning-about-teaching
NEPC Review: Great Teachers and Great Leaders (May 2010)
Great Teachers and Great Leaders (GTGL) is one of six research summaries issued by the U.S. Department of Education in support of its Blueprint for Reform. This review examines the presentation of research about improving teacher and administrator quality in GTGL. The review concludes that there are serious flaws in the research summary. The report, however, lacks sufficient analytic depth, does not present its evidence in a logical manner, makes sweeping claims, and draws conclusions based on weak data.
Suggested Citation: Shaker, P. (2010). Review of "Great Teachers and Great Leaders." Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/publication/great-teachers