I have a good friend who teaches physics at my school. He has taught for over 10 years, is National Board Certified, and has a PhD in biomedical engineering. Over the past several years the scores of his physics students on the North Carolina End-of-Course (EOC) test in physics have been at or near the top of the district. Last year, our school district decided to implement "benchmark" testing throughout the year in order to ensure that teachers and/or students are on track for the EOCs. These test supposedly mimic the EOCs and give district officials probable cause to interrogate principals and teachers whose scores aren't where they should be.
On the first benchmark test of this year, my friend's physics students scored below the district average. The principal's response to this development was to ask my friend, "What's the problem in physics?" My friend reacted with indignation to this comment by the principal. Why did such a seemingly innocent question catalyze such a strong reaction from my friend? I think that unpacking this situation will reveal much about the dynamics within my school; further, I believe this situation is emblematic of some larger, common problems that occur vis-a-vis public schools in the Testing Era.
1) If poor student test scores can be grounds for teacher critique, good test scores should be used for encouragement. At no time during the past seven years of teaching in a high-stakes accountability environment has my friend received affirmation from the principal for his consistently outstanding test scores. One can't have it both ways: either the test scores are meaningful (both positive and negative) and say something about the job the teacher is doing, or they are useless and don't warrant a response. It is demoralizing for teachers when administrators are silent about high test scores but jump on a teachers for low test scores.
2) Superficial conclusions will infuriate teachers. The question that the principal posed to my friend was really a conclusion: that there must be a problem in physics this year and that it must be simple enough to be explained right then and there, in a sound-bite. This occurred in the school's main office, where my friend had gone to drop off a form. The bell to signal the beginning of the next class period was set to ring momentarily. The question the principal asked does not have a simple answer. As such, it doesn't warrant the simplistic, 5-minute sound-bite answer ("I forgot to teach the unit on mechanics," "I neglected to tell the kids that g=9.8 m/s/s," "I told the students that they could use ballpoint pen to bubble in the answer form on the benchmark") that he must have been expecting. Hamstrung, there was really no way my friend could respond in that context, in the busy main office of the school, in the middle of the day, 5 minutes before 24 students would be pounding on his classroom door to be let in. The person asking that question needed to be willing to work a little harder to hear a legitimate--rather than a simplistic--analysis of the state of his physics classes this year. Absent that, a little righteous indignation on the part of the teacher is perhaps understandable.
3) Superficial conclusions are based on superficial analyses. The test scores in question were the average percent correct for my friend's physics students on that benchmark test compared to that of the rest of the district. Hypothetically, let's say my friend's students had averaged 80% correct on the test while the rest of the district averaged 87% correct. That doesn't look too good. But what if his students were not especially high achieving students? Then is getting 80% correct really so bad? What if the average math PSAT score of the other physics students in the district was 650, while the average math PSAT score of my friend's students was 575. Now those scores are actually beginning to look a little better. The problem with comparing average scores of different cohorts of students without controlling for any other variables is that it can only lead to superficial conclusions such as the one proffered by the principal.
In North Carolina there are two sets of test scores that are relevant for different reasons. The first set of relevant scores are basically the percentage of items that students get correct on a given test--these are called Composite Scores. This is the set of scores that gets published in local newspapers, that gets on school websites, and that most of the general public (and sadly probably many educators) thinks is the only data that matters. However, there is another data set in calculated in North Carolina that is perhaps more important. This second set of data is student Growth Scores. For each test given in North Carolina, the state Department of Public Instruction has devised a formula that, given a student's past performance on other relevant tests, predicts their performance on the current test. If a student performs as predicted, he has Met Growth. If the student scores far enough below the predicted score, he has No Growth. If the student scores far enough above the predicted score, the he may achieve High Growth. In North Carolina, schools are awarded bonuses based on how many of their students meet the various levels of Growth. I would argue--assuming that the prediction formula is legitimate (which is a whole 'nother kettle of fish)--that Growth Scores are more powerful indicators of how the teachers and students are doing than are the Composite Scores. If a student who is predicted to score a 75 instead scores an 87, the teacher might be doing something really well. If a student who is predicted to get a 95 instead scores 73, then there might be a problem. Sadly, growth scores are not published in the newspaper, on school websites, and in my experience are even difficult to get my hands on as a teacher. I have yet to see individual growth scores for any of my students, yet it would be quite helpful for me to know which students are learning more than predicted and which are learning less than predicted. The fact that I just had to write a paragraph explaining what Growth Scores are is probably the reason why, despite their usefulness, these scores are not utilized for the benefit of increased student achievement: these scores require more than a superficial understanding to interpret them.
It also turns out that AP Biology has become wildly popular at my school. This is the result of it being taught by a wonderful teacher and the fact that the class takes an annual trip to an exotic, tropical location each spring break. As a consequence of this, many high-performing juniors are enrolling in AP Biology in lieu of physics. Since only 3 years of science are required for graduation, and because my school is an arts magnet school, many of these students stop taking science after AP Biology. This swell in AP Biology enrollment is causing fewer high-achieving students to take physics and it seems reasonable this trend could cause average physics scores--BUT NOT NECESSARILY GROWTH SCORES--to drop.
In fact, my friend the physics teacher has led the district and my school in Growth Scores over the past five years. Because of the growth scores of his students in physics and physical science, he is almost single-handedly responsible for my school receiving bonus money for each of the past five years. And the students he currently teaches, those same students who scored below the district average on the benchmark test, may very well receive high growth scores on the EOCs at the end of the year because those scores take into account that his students are not necessarily the same as the other physics students around the district. So the question of "what's the problem in physics" should probably become "is there a problem in physics?" I guess we'll have to wait and see.
But does anyone (policy makers, administrators, the general public) really want to take the time to look this deeply at the issues? It's much easier just to conduct a superficial analysis of the situation ("our average scores have dropped") and then make a superficial conclusion ("somebody must be doing something wrong").
4) Superficial conclusions lead to superficial solutions. The district in which I teach seems to have a different silver bullet each year for fixing our many problems. However, one silver bullet that has stuck around is a district-wide intranet called RIO. RIO contains unit plans, pacing guides, and a smattering of lesson plans that all teachers in the district are supposed to utilize in order to make student learning more uniform from school to school. This, in turn, is supposed to help students learn better and, of course, raise test scores. It makes sense then, that after the principal asked my friend what the problem was in physics, he followed that question up with, "Have you been using RIO?"
What's funny about that question, and about the district's almost fanatical emphasis on RIO as the solution to all of our problems, is the fact that there is no evidence that I'm aware of that can show that RIO is an effective means to improve student achievement. Sure it cost lots of money...sure it involves technology...sure it sounds sexy...but has it been shown to increase student learning? There is no data to suggest that the lesson plans on RIO are any more effective than other lesson plans teachers have been using. There is no data to suggest that the pacing guide on RIO is any more effective than other pacing guides teachers have used. In fact, if our district's academic performance is any indication, RIO's early returns look pretty bad. Last year was the first full year that RIO had been implemented and the district's test scores last year--both Composite and Growth Scores--were at an all-time low. Furthermore, when my friend was leading the district in both Composite Scores and Growth Scores over the past five years, RIO didn't even exist!
It seems that when it comes to education, people want simple solutions such as "Every teacher must do X to improve test scores." These superficial solutions--some of which can be quite expensive--can sound really good, but if they are based on superficial conclusions, which are based on superficial analyses, then we're really not making much progress.
Wednesday, January 24, 2007
Subscribe to:
Post Comments (Atom)
1 comment:
I would like to encourage your analysis. This is good stuff. I guess I am curious what people think alternatives to "standardized tests" might be.
- He who stood in line "back in the day"
Post a Comment