Student evaluations and how to use them have been common topics for discussion here and at other universities for many years. As a result, I wasn’t surprised to see yet another report and set of summary recommendations from the UofA on this issue. I was surprised, however, by some of the conclusions presented in relation to the research conducted in this report. In several cases, the conclusions do not fit with the actual available evidence, which, I would argue, is quite problematic.
Because I don’t have time to comment on all of the research presented in this report, I’m just going to discuss some of the research on gender and race bias in evaluations, since this area connects to some of my own research on discrimination more broadly. Here, I was very surprised to see the authors come to the conclusion that, “The literature in this category is extensive and conflicted” (p. 4). Is the evidence really that conflicted? Even the list of literature included in Appendix A shows a large and persistent gender bias across studies. It’s especially apparent in studies with a strong methodological framework and in more recent research on the topic, some of which was not actually included in the report.
For evidence of no bias, the report authors rely on a 2012 meta-analysis by Wright and Jenkins-Guarieri. In discussing gender bias in evaluations, the authors described the findings of the Wright and Jenkins-Guarieri (2012) article, stating that “Wright and Jenkins-Guarieri (2012) conducted a meta-analysis of 193 studies and concluded that student evaluations appear to be free from gender bias” on page 4 of the Summary Report. However, if you read beyond the abstract of this article, you will see that this article is a meta-analysis of meta-analyses that only includes one 1993 meta-analysis related to gender in 28 studies (not the 193 cited in the report).
Wright and Jenkins-Guarieri (2012) call into question their own findings regarding gender bias in evaluations. They note the following at the end this article:
Lastly, it appears that interactions between instructor and student gender do not impact SET ratings significantly, according to one meta-analysis. This indicates that instructors should consider neither their gender nor that of their students when receiving and interpreting SET results. The results also suggest that administrators do not need to consider instructors’ gender when assigning instructors to various classes. However, the meta-analysis largely included studies from the 1970s, and more current research is needed before making conclusive statements based on gender (p. 693).
One area specifically that warrants further investigation is the biasing variable of gender. In examining the only meta-analysis focusing on gender as the primary biasing variable on SETs, Feldman (1993) concluded that gender had little effect on SETs, but more current research may suggest otherwise (Bachen, McLoughlin, and Garcia 1999; Centra and Gaubatz 2000); it is also worth noting that the majority of studies included in the meta-analysis were conducted in the 1970s (p. 695).
This is the primary study that the report authors use to conclude that “the literature in this category is extensive and conflicted” (p. 4). Limitations were also present in the other two studies (Centra and Gaubatz 2000; Smith et al. 2007) the report authors list as showing gender bias. For instance, Centra and Gaubatz (2000) were primarily interested in the gender of students, not the gender of instructors.
In contrast, the authors list seven studies that indicate some form of gender bias. These even include studies that apply an experimental and audit methodology (MacNell, Driscoll, and Hunt 2015), which tend to present the best evidence in relation to discrimination (Pager and Shepherd 2008; National Research Council 2004). Below, I have also listed some additional references that further investigate these issues, which the report authors seem to have missed. Here, I would like to draw special attention to a forthcoming article by Mengel, Sauermann, and Zölitz (2017), which relies on a quasi-experimental dataset of 19,952 student evaluations from the Netherlands. Stark and Freishtat (2014) also have a detailed review of SET evidence that seems to be missing from this report.
In addition to the limitations in the review of gender bias in evaluations, I was also concerned about the lack of a discussion of potential racial bias in student evaluations. I know that research on this topic is very limited, but we need to consider race as well. A few studies (Subtirelu 2015; Wagner, Rieger, and Voorvelt 2016) do show some evidence of bias. This should be noted.
If the university insists on using USRIs to determine pay and promotion decisions for faculty, we must address these issues. If bias is present, as it appears to be in the research, this will limit opportunities for women and racial minority tenure-track faculty members. The potential consequences for contract instructors are much worse. In this case, evaluations can influence whether or not they will have a job the following semester. Finally, gender bias does not just affect women. Gendered expectations for male faculty can also influence student perceptions and evaluations (Sprague and Massoni 2005).
There is no doubt that we need more research on this issue (and many others). I absolutely agree with the report authors on this point. Many of the studies are small and connected to a single course or department. These could definitely be expanded. Let’s put together a larger one here. Let’s talk about analyzing USRI data at the University of Alberta. TSQS provided some information about the overall median scores for men and women on one question. Let’s take a look at each question. Let’s break down the results by course year, course size, and department. Let’s also look at scores over time. I would also love to see an analysis of written comments. How might these differ by gender, age, and race? Analyses of comments on RateMyProfessor.com (yes, not a random sample, but still informative) indicate that disparities are likely (see http://benschmidt.org/profGender/# for an interactive look at terms used to describe male and female faculty). We love to talk about analyzing “big data.” TSQS has a ton of it. Let’s analyze it and connect it to other data.
At a university, it is imperative that we use research to inform policy. It is even more important that we use good research to inform our policies. If we are going to spend the time and money to conduct such studies, we need to make sure that the research done is high quality and useful.
Bianchini, S., Lissoni, F., & Pezzoni, M. (2013). Instructor characteristics and students’ evaluation of teaching effectiveness: evidence from an Italian engineering school. European Journal of Engineering Education, 38(1), 38-57.
Boring, A., Ottoboni, K., & Stark, P. B. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research, 10.
Braga, M., Paccagnella, M., & Pellizzari, M. (2014). Evaluating students’ evaluations of professors. Economics of Education Review, 41, 71-88.
Guarino, C.M., & Borden, V.M.H. (2017). Faculty service loads and gender: Are women taking care of the academic family? Research in Higher Education, 58(6), 672-692.
Handley, I. M., Brown, E. R., Moss-Racusin, C. A., & Smith, J. L. (2015). Quality of evidence revealing subtle gender biases in science is in the eye of the beholder. Proceedings of the National Academy of Sciences, 112(43), 13201-13206.
Loes, C. N., Salisbury, M. H., & Pascarella, E. T. (2015). Student perceptions of effective instruction and the development of critical thinking: A replication and extension. Higher Education, 69(5), 823-838.
MacNell, L., Driscoll, A., & Hunt, A. N. (2015). What’s in a name: exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291-303.
Mengel, F., Sauermann, J., & Zölitz, U. (2017). Gender bias in teaching evaluations. IZA Discussion Paper No. 11000. Available at SSRN: https://ssrn.com/abstract=3037907
Miles, P. & House, D. (2015). The tail wagging the dog: An overdue examination of student teaching evaluations. International Journal of Higher Education 4(2), 116-126.
National Research Council. (2004). Measuring racial discrimination. National Academies Press.
O’Meara, K.A. (2016). Whose problem is it? Gender differences in faculty thinking about campus service. Teachers College Record, 118(8), 1-38.
Ortega-Liston, R., & Rodriguez Soto, I. (2014). Challenges, choices, and decisions of women in higher education: A discourse on the future of Hispanic, Black, and Asian members of the professoriate. Journal of Hispanic Higher Education, 13(4), 285-302.
Pager, D., & Shepherd, H. (2008). The sociology of discrimination: Racial discrimination in employment, housing, credit, and consumer markets. Annual Review of Sociology, 34, 181-209.
Sprague, J. & Massoni, K. (2005). Student evaluations and gendered expectations: What we can’t count can hurt us. Sex Roles, 53 (11-12), 779-793.
Stark, P. B., & Freishtat, R. (2014). An evaluation of course evaluations. ScienceOpen Research, 9, 2014.
Subtirelu, N. C. (2015). “She does have an accent but…”: Race and language ideology in students’ evaluations of mathematics instructors on RateMyProfessors.com. Language in Society, 44(1), 35-62.
Wagner, N., M. Rieger, & K. Voorvelt. (2016). Gender, ethnicity and teaching evaluations: Evidence from mixed teaching teams. Economics of Education Review, 54, 79-94.