This is a brief post meant to stimulate conversation about a question faculty and other instructors across campus should be asking: in a decade in which studies indicating that student evaluations of teaching involve bias against certain groups of instructors, how should our teaching evaluations at the University of Alberta, the USRIs (Universal Student Ratings of Instruction), be used?
The issue was on the floor Monday at the first 2017-18 meeting of the General Faculties Council (GFC) in relation to a report tabled by the Committee on the Learning Environment (CLE) responding to a 30 May 2016 motion of GFC. Problem is, neither the formal paperwork for the motion nor the report itself correctly cited the motion. As a result of an amendment I moved at that May 30th meeting in 2016, the motion included a crucial word that should have been a touchstone for the committee’s work. The word omitted from the formal paperwork and every mention of the motion in the report is in blue:
THAT the General Faculties Council, on the recommendation of the GFC Executive Committee, request that the GFC Committee on the Learning Environment report by 30 April 2017, on research into the use of student rating mechanisms of instruction in university courses. This will be informed by a critical review of the University of Alberta’s existing Universal Student Ratings of Instruction (USRIs) and their use for assessment and evaluation of teaching as well as a broad review of possible methods of multifaceted GFC General Faculties Council 05/30/2016 Page 9 assessment and evaluation of teaching. The ultimate objective will be to satisfy the Institutional Strategic Plan: For the Public Good strategy to: Provide robust supports, tools, and training to develop and assess teaching quality, using qualitative and quantitative criteria that are fair, equitable, non-discriminatory and meaningful across disciplines. CARRIED
The paragraph on page 4 of the CLE report dealing with studies indicating that student evaluations involve a gender bias reads as follows:
● Gender: The literature in this category is extensive and conflicted. Numerous articles in this subcategory report gender differences or no differences in student evaluations of teaching. For example, Boring, Ottoboni, and Stark (2016) concluded that student ratings are “biased against female instructors by an amount that is large and statistically significant.” On the other hand, Wright and Jenkins-Guarieri (2012) conducted a meta-analysis of 193 studies and concluded that student evaluations appear to be free from gender bias. The University of Alberta TSQS conducted descriptive analyses and the results showed there is no apparent difference between scores for males ( N†=18576, Mdn†= 4.53) and females ( N†= 13679, Mdn†= 4.57) for statement 211 (“overall the instructor was excellent”) .
In my remarks at GFC I noted that it should concern us that this paragraph is cursory in its treatment of the possibility that student evaluations of teaching involve gender discrimination. I also find the description of the “literature” as “extensive and conflicted” odd given that the table in the report clearly shows that studies indicating gender bias are far more numerous than those that do not. But how about that last sentence?! What do you make of that?
I didn’t mention the sentence in my prepared remarks, partly because I aimed to keep those remarks to no more than two minutes. Over a hundred people sit on GFC. On such a weighty matter I assumed that many colleagues would want to speak. Instead, the presentation team from CLE defended their position by citing that sentence. I cannot for the life of me see how that sentence shows anything. To say that the median scores are the same for men and women tells us nothing about how the scores are achieved. It surely obscures much. Right? But I’m just a Shakespearean . . . .
GFC, by the way, was being asked to endorse a set of recommendations in which our USRIs would continue to be used for summative purposes — that is, for merit, tenure, and promotion decisions. I stated that we need to take seriously the statement on page 10 of the report that indicates the reservations of some chairs: “Some department chairs expressed concerns around biases, validity, and the potential for misinterpretation of USRI results for summative purposes of promotion and tenure decisions.”
This, in my view, is exactly the issue. Instructors at the University of Alberta need to receive formal feedback from their students about their courses. Formative feedback on their teaching from their students is important. And we could seek that feedback by more sophisticated means than we currently do. But with a growing body of research indicating that student evaluations of teaching involve bias — the most significant studies are about bias in the assessment of instructors who are women — it would not be responsible for GFC to continue to endorse the use of USRIs for “summative purposes.” I cited the conclusion of Boring, Ottoboni, and Stark study to this effect. Its final statement reads as follows:
[T]he onus should be on universities that rely on SET [student evaluations of teaching] for employment decisions to provide convincing affirmative evidence that such reliance does not have disparate impact on women, underrepresented minorities, or other protected groups. . . . Absent such specific evidence, SET should not be used for personnel decisions.
The issue has now entered the international mainstream in the form of an article in last week’s Economist which you can read here. The Economist discusses a study published last Fall by another team of researchers, Mengel, Sauermann, and Zölitz. The Mengel, Sauermann, and Zölitz study is not mentioned in the CLE report. In the Economist it is discussed under the category “Academic Sexism.”
At GFC, President Turpin invited someone to move a postponement of consideration of the CLE’s recommendations. The matter will presumably return at the next meeting of GFC, scheduled for 30 October 2017. Can I hear from you before then? Especially about that darn sentence: The University of Alberta TSQS conducted descriptive analyses and the results showed there is no apparent difference between scores for males ( N†=18576, Mdn†= 4.53) and females ( N†= 13679, Mdn†= 4.57) for statement 211 (“overall the instructor was excellent”). I have heard some very scathing things about this statement from people whose disciplines make their critique significant but I’d like to hear more.
Shall I also ask Boring (Institut d’études politiques de Paris), Ottoboni (Berkeley), Stark (Berkeley), Mengel (University of Essex), Sauermann (Stockholm University), and Zölitz (Institute on Behaviour and Inequality, Bonn, Germany), what they make of it? ; )
Oh, and for now you can read CLE’s report in full here.