I spend a lot of time – arguably too much time – thinking about how teaching is evaluated, how I’d like to be evaluated, and what causes variation in teaching evaluations. This post covers several of those topics, and relates to my post from last week on pros and cons of flipping the classroom.
In comments on my post last week (including some on twitter), some people pointed out (correctly) that student frustration leading to lower teaching evaluation scores is primarily a problem if student teaching evaluations are weighted reasonably heavily. Heavily weighting student evaluations is a problem, given that such evaluations are known to be biased. Given that evaluations are affected by class size, grading reputation of the instructor, response rate, and instructor gender (among other things), ideally faculty teaching would be evaluated in ways beyond student evaluations. Fortunately for me, both Georgia Tech and Michigan had peer evaluations of teaching. These evaluations came at multiple points while pre-tenure, and were aimed to give constructive feedback on teaching, but also to try to evaluate teaching in a way where good instructors wouldn’t be penalized for being challenging, female, etc.
How much weight do student evaluations of teaching hold? My impression is that it’s variable. I don’t know of anywhere where they are completely ignored, but it seems like some places don’t put a lot of weight in them; on the other hand, at other schools they carry a lot of weight. Part of why I wrote the post last week was because of email conversations (spurred by this earlier blog post) with some pre-tenure faculty at other universities who are receiving very strong pressure to increase their teaching evaluations prior to coming up for tenure. And Terry McGlynn has said that “student evaluations are the main method used to evaluate [his] teaching.” Clearly there are institutions where teaching evaluations carry a lot of weight, despite all their flaws.
Regardless of how heavily they are weighted, there are two questions related to teaching evaluations of students that I’ve been wondering about. First, what is the most important question on those evaluations at different institutions? And, second, what would be my ideal (free-response) feedback from a student?
At Georgia Tech, the most important question was “The instructor was an effective teacher.” At Michigan, it is “Overall, the instructor was an excellent teacher.” Based on conversations with someone at Michigan’s excellent Center for Research on Learning and Teaching, even a seemingly small change like that on the question can have an influence on how students rate an instructor. Given the typical 1-5 Likert scale (strongly disagree to strongly agree), some students who think an instructor was very good will put “strongly disagree” when asked if an instructor is an “excellent” teacher. That was surprising to hear to me, since I assumed most people mentally transform the scale to something like: The instructor was a [1=very bad 2=bad 3=okay 4=good 5=excellent] teacher. This got me to wondering how universities settle on the wording of these questions. (I’m sure there’s a committee!) I’d love to know more about how the wording of the question influences responses, but haven’t had time to research it. What is the main question at your college or university?
To move to the second question, related to what my ideal evaluation would be: I think my ideal student evaluation would be something like “Dr. Duffy’s class was challenging but fair. It taught me how to think critically and made me realize how interesting ecology is. I have a much better understanding of how science works now and feel prepared to move on to upper level courses.” I don’t expect to actually get that evaluation (for starters, I’d be kind of shocked if a student used the phrase “challenging but fair” in an evaluation), but thinking about it helps me think about what I think is important when teaching: teaching students to think critically, improving their skills related to the process of science (for example, reading figures, evaluating experimental design), and making them enthusiastic about ecology as a topic.
Coming back to the topic of how to evaluate teaching in ways that do not center on student teaching evaluations: I would love to see how students who have taken a flipped version of Intro Bio do compared to students who took a traditional format class when they get to Ecology, Evolution, and Genetics. Do they do better in upper-level courses than students who took a traditional lecture format version of Intro Bio? Worse? It would also be great to look at retention rates – are students who take courses in one format or the other more likely to move on to those upper level courses? Those measures are what I really want, but I don’t have them yet.
In the meantime, I wonder how much I should pay attention to student evaluations. I mostly try to focus on what pedagogical literature says is effective, but I would be lying if I said that I don’t think about student evaluations. I think it would be good if I thought about them less, but I haven’t figured out how to do that yet. But I have a post in the queue for next week that is mostly in jest, but that might have some truth in terms of how to do that. (Preview: it would involve bombarding myself with negative teaching evaluations daily.)
Finally, in preparing this post, I noticed that the Slate piece I linked to earlier said that, in that author’s experience, most people don’t read their teaching evaluations. I’ve heard people say that, but I think most people I know do read them. So, I’m curious, do you read yours?