Answer Sheet: Why One Testing Expert Says New York State's Ever-Changing Standardized Testing Program is 'Useless'
Every spring, millions of students across the country take standardized tests in English language arts and math that are mandated by Congress as measures of accountability for a school’s performance in boosting student achievement.
And every year, there are reports about computer glitches with administration of standardized exams that affect when and how kids take the test, and there are questions about some of the exam questions.
For example, computer-administered tests had to be delayed at least one day because of problems in New York state.
In Massachusetts, education officials just threw out a question on a 10th-grade standardized exam that was based on a passage from “The Underground Railroad,” a Pulitzer Prize-winning novel. It asked students to write an essay from the perspective of a racist white woman living in the South before the Civil War who is a character in the book.
The question, the Boston Globe reported, “sparked a range of questions among students, including whether using racist language would win them points for historical accuracy or deductions for inappropriateness.” School administrators complained to state officials, who announced the question would not be scored.
But beyond individual questions, critics of high-stakes standardized tests say the exams are not valid assessments of students or schools. In Albany, N.Y., on Monday, a group of teachers rallied outside the state’s education department and called for changes in the testing program.
Jolene DiBrango, vice president of New York State United Teachers, said other problems include a mismatch between test content and student grade level, and Buffalo’s NPR news station WBFO reported there are questions about proficiency benchmarks.
New York state officials defend their testing program, saying they have responded to criticism by shortening exams and ending time limits for students to take them. And they say the tests are fair and valid.
In a statement about the computer problems during recent testing, New York State Education Commissioner MaryEllen Elia issued a statement saying in part:
As we have done from the start, we will ensure no school is unfairly penalized for participating in computer-based testing. A thorough comparability analysis will be done to review the student computer-based results and paper-based results. We will make the appropriate adjustments to student scores as we did last year.
But even the state’s attempts to improve the program based on complaints about it have presented their own problems, according to testing expert Fred Smith, who explains in the piece below why he believes the New York state standardized testing program is useless.
Smith is a consultant who retired after decades working as an analyst with the New York City public school system. A version of this appeared in the New York Daily News.
By Fred Smith
It’s spring testing season and some 1.2 million students in New York state are spending several days taking federally mandated English language arts and math standardized tests — just like they do every year. But to what end?
Congress requires that states test students in grades 3-8 every year to, supposedly, measure how well students are mastering standards and to show how districts and states are doing in helping them succeed.
But that’s not what really happens. I argue that the annual New York testing program is fundamentally useless. Its inefficacy can be seen by tracking changes in the program itself over the last decade — from the time the Common Core Learning Standards were introduced to all public schools until today.
Basic transformations occurred along this “Core-aligned” testing timeline that render efforts to compare the results from year to year an exercise in futility:
- The testing framework was revised in 2010 and 2011 at the insistence of the Board of Regents chancellor as part of her sweeping “education reform agenda,” which made more “rigorous” exams its cornerstone.
- A transition period (2012) allowed the new test publisher, Pearson, Inc., only one year to familiarize itself with the scope of the statewide program prior to full-fledged inauguration of the Common Core Learning Standards. (Implementation of the Core in the state was rushed and teachers had little time to prepare.)
- Core-aligned tests were initiated in 2013 to establish a baseline against which to measure student progress in meeting the standards.
- There was a seismic shift in the statewide testing population in 2014 and 2015, with 20 percent of students opting out of the exams. (Most of the resistance arose in schools and districts outside of New York City, where parents saw how education had been shackled by all the testing.)
- Time limits were removed from the tests in 2016, taking away uniformity in their administration. Without a single “pencils down” moment for all students, standard testing conditions ceased to exist. Children took as much time as needed to complete their exams. Such free-range testing negates any chance to evaluate how students and schools stand relative to each other even within the same year, much less year to year.
- In 2017, a new publisher was brought in (Questar Assessment) after a handoff from Pearson, which lost its contract with New York state after repeated complaints by students and educators about the validity of some test questions. This change created an unaccounted-for source of variation in the construction of the exams and the results they yield.
- Concurrent with the arrival of Questar, the State Education Department (SED) added another confounding variable by debuting computer-based tests in hundreds of schools that had the technological capacity to administer them. Have-not schools continued to take traditional, No. 2 paper-and-pencil tests. How did having two testing modes differentially impact the results? We don’t know.
- In 2018, the number of combined English and math testing days was reduced from six to four, the tests were shortened and the scoring scale was altered, further defying attempts to make sense of results or draw conclusions about progress.
How do you keep score, either of individual kids’ progress or of schools’ overall performance, when the rules keep changing and the goal posts keep moving? You can’t.
There is one finding, however, that is sufficiently robust to have emerged despite the vagaries of an ever-changing program. The youngest children, (400,000 third- and fourth-graders statewide in New York) have been befuddled by the ELA and math exams. So too have English Language Learners, special education students and children of color on whom the tests have had a negative impact.
Underscoring the futility of New York’s system, there has been immediate feedback from parents and schools about the 2019 exams after the first two days. Significant numbers of glitches and technical disruptions occurred with the computer-based tests and test administrations had to be delayed at least one day.
In addition, we are hearing about a wide range of time that students are taking to finish the tests — from 50 minutes to hours. Such latitude follows directly from an SED memo to superintendents and principals that says: “As long as they are working productively, students should be allowed as much time as they need only within the confines of the regular school day to complete each session.” [SED’s emphasis.]
Cases have been reported of students working all morning or even later. Without uniform test-taking conditions, children in the same schoolroom, grade or district cannot reasonably be compared with others.
In virtually every year from 2012 through 2018 there have been differences in the publishers, the test population and the test parameters. Such discontinuity is antithetical to the establishment of a coherent testing system.
When are we going to move on from this confusion?
This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:
The views expressed by the blogger are not necessarily those of NEPC.