What Are Student Teaching Evaluations Good For?

LEE Li Neng
Department of Psychology
Faculty of Arts and Social Sciences

Using the analogy of evaluating the quality of Hainanese chicken rice, Li Neng argues for the value of student teaching evaluations (STEs) in informing faculty of their teaching quality.

Image courtesy of Freepik.com
Recommended Citation
Lee L. N. (2021, June 23). What are student teaching evaluations good for? Teaching Connections. https://blog.nus.edu.sg/teachingconnections/2021/06/23/what-are-student-teaching-evaluations-good-for/

Student teaching evaluations (STEs) have been the most dominant method used to evaluate teaching and were synonymous with teaching evaluation in general (Seldin, 1999). However, there has been increasing hostility and cynicism towards STEs (Nesso & Fresko, 2002), with claims that STEs do not work (e.g., Hornstein, 2017; Stark & Freishtat, 2014). Such studies lament that student ratings are (1) affected by grade expectations (Worthington, 2002); (2) biased towards male teachers (Boring et al., 2016); and (3) affected by class size and type (Miles & House, 2015), it is little wonder that a colleague remarked, “Well…it’s essentially a popularity contest, right?”

Considering that STEs are beset by various issues, the question remains: What are student teaching evaluations good for? Before we reject STEs outright, we need to understand what it really measures. First, it is important to note that well-constructed STEs measure teaching experiences, i.e., whatever students experience in the classrooms, and not teaching effectiveness, i.e., how teaching translates to observable learning outcomes (Chalmers & Hunt, 2016). Students’ feedback on teaching have been shown to be useful for informing teaching when accompanied by appropriate consultation (Marsh & Roche, 1997).  As such, we should not underestimate what students can discern.

Imagine if I were to evaluate a plate of Hainanese chicken rice, I am confident I can quickly tell whether it is good or bad. Do I know what exactly the hawker did (Is it the chicken, seasoning, or the technique?) that made it really good? Not really. Do I know how to cook a plate of chicken rice? I wish! However, am I still a good judge of whether it is a good plate of chicken rice? Probably, as it primarily comes from having amassed extensive experience with consuming chicken rice (and various hawker food) over many years of growing up Singaporean.  

Image courtesy of Visit Singapore


Similarly, students are capable of discerning what good teaching looks like. Recent meta-reviews of thousands of studies have concluded that STEs are (1) relatively valid when compared with various effective teaching indicators (e.g., student achievement, expert ratings of teaching behaviour, self-ratings, alumni ratings; see Spooren et al., 2013), and (2) relatively unaffected by potential biases (Marsh, 2007), directly addressing the concerns raised above.

In addition, these studies indicate that the relationship between grade expectations and STEs might be due to a third variable, i.e., how much students have learnt. Students who learnt more are then likely to think that they would do well, and subsequently, provide better ratings (Marsh & Roche, 2000). Moreover, claims that certain variables bias STEs (e.g., the instructor’s gender) accounted for very little variance in STEs (i.e., 1% or less; Smith et al., 2007). In sum, if STEs indicate that our teaching is good (or bad), it is still wise to focus on what is happening and why.

However, it should be made clear that while students are likely able to differentiate between good and bad teaching, they may not be able to articulate why it is so. Like the chicken rice analogy, students may not have a clear idea of what exactly contributed to their great learning experience. For that, we rely on other sources of information. Chalmers and Hunt (2016) highlighted that besides STEs, other valuable sources of evidence of good teaching include: peers and colleagues, self-assessment, and student achievement1

Figure 1: Sources of Evidence

This leads us to a second important issue: teaching quality is influenced by several factors (Bell et al., 2011) including:

  • teacher constructs (e.g., teacher knowledge)
  • student constructs (e.g., student beliefs), and
  • contextual factors (e.g., school leadership and policies).

Crucially, recent studies suggest that teaching quality is affected by student motivation (Fauth et al., 2020), and a good fit between teacher and student characteristics (Feistauer & Richter, 2017). This suggests that (1) students may underestimate their own role when it comes to teaching quality; (2) it is critical to manage motivation and expectations to create a “good fit”.

If I ordered chicken rice at a hawker center, and what arrived instead was a plate of culinary foam with some mysterious grains, I would probably be bewildered and have lots to say (“What is this lousy expensive dish that isn’t chicken rice?”). However, my evaluation would be very different if the same dish appeared after I made the same order at an avant-garde Michelin-starred restaurant.

A good fit between expectations matter.

In this case, good student ratings on teaching quality is a result of a good fit between general expectations of great teaching and the actual experience (a.k.a. the “good hawker” approach), where one caters for the masses.

In summary, STEs are still useful for determining teaching quality, but how we interpret and use the feedback may differ depending on our understanding of the teacher, the student and other contextual factors.

LEE Li Neng is a Senior Lecturer at the Department of Psychology, NUS Faculty of Arts and Social Science, and a Fellow at the NUS Teaching Academy. He is interested in understanding how education shapes the areas of critical thinking, curiosity and creativity, and how technology can be utilised to provide a more personalised form of education. Li Neng was awarded the Annual Teaching Excellence Award (ATEA) from 2018-20, and inducted in the ATEA Honour Roll in 2020.

Li Neng can be reached at psylln@nus.edu.sg.


  1. Their review comprehensively discusses what each source of evidence is best used for when evaluating teaching.


Bell, C. A., Gitomer, D. H., & Croft, A. J. (2011). The contextual factors, constructs, and measures associated with teaching quality. Princeton, NJ: Educational Testing Service.

Berk, R. A. (2005). Survey of 12 strategies to measure teaching effectiveness. International Journal of Teaching and Learning in Higher Education, 17(1), 48–62. https://www.isetl.org/ijtlhe/pdf/IJTLHE8.pdf

Boring, A., Ottoboni, K., & Stark, P. B. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research, 10. http://dx.doi.org/10.14293/S2199-1006.1.SOR-EDU.AETBZC.v1

Chalmers, D., & Hunt, L. (2016). Evaluation of teaching. HERDSA Review of Higher Education, 3, 25-55. https://www.herdsa.org.au/herdsa-review-higher-education-vol-3/25-55

Fauth, B., Wagner, W., Bertram, C., Göllner, R., Roloff, J., Lüdtke, O., Polikoff, M. S., Klusmann, U., & Trautwein, U. (2020). Don’t blame the teacher? The need to account for classroom characteristics in evaluations of teaching quality. Journal of Educational Psychology, 112(6), 1284–1302. https://doi.org/10.1037/edu0000416

Feistauer, D., & Richter, T. (2017). How reliable are students’ evaluations of teaching quality? A variance components approach. Assessment & Evaluation in Higher Education, 42(8), 1263-1279. https://doi.org/10.1080/02602938.2016.1261083

Hornstein, H. A. (2017). Student evaluations of teaching are an inadequate assessment tool for evaluating faculty performance. Cogent Education, 4(1), 1304016. https://doi.org/10.1080/2331186X.2017.1304016

Greenwald, A. G. (1997). Validity concerns and usefulness of student ratings of instruction. American Psychologist, 52(11), 1182-1186. https://psycnet.apa.org/doi/10.1037/0003-066X.52.11.1182

Marsh, H. W. (2007). Students’ evaluations of university teaching: Dimensionality, reliability, validity, potential biases and usefulness. In The scholarship of teaching and learning in higher education: An evidence-based perspective (pp. 319-383). Springer, Dordrecht.

Marsh, H. W., & Roche, L. A. (2000). Effects of grading leniency and low workload on students’ evaluation of teaching: Popular myth, bias, validity or innocent bystanders? Journal of Educational Psychology, 92, 202–228. http://dx.doi.org/10.1037/0022-0663.92.1.202 

Marsh, H. W., & Roche, L. A. (1997). Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American psychologist, 52(11), 1187.

Miles, P., & House, D. (2015). The tail wagging the dog: An overdue examination of student teaching evaluations. International Journal of Higher Education, 4(2), 116-126. https://doi.org/10.5430/ijhe.v4n2p116

Nasser, F., & Fresko, B. (2002). Faculty views of student evaluation of college teaching. Assessment & Evaluation in Higher Education, 27(2), 187–198. https://doi.org/10.1080/02602930220128751

Seldin, P. (1999). Current practices–good and bad–nationally. In P. Seldin & Associates (Eds.), Changing practices in evaluating teaching: A practical guide to improved faculty performance and promotion/tenure decisions (pp. 1–24). Bolton, MA: Anker.

Smith, S. W., Yoo, J. H., Farr, A. C., Salmon, C. T., & Miller, V. D. (2007). The influence of student sex and instructor sex on student ratings of instructors: Results from a college of communication. Women’s Studies in Communication, 30, 64–77. http://dx.doi.org/10.1080/07491409.2007.10162505

Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research83(4), 598-642.

Stark, P. B., & Freishtat, R. (2014). An evaluation of course evaluations. ScienceOpen Research, 9, 2014. http://dx.doi.org/10.14293/S2199-1006.1.SOR-EDU.AOFRQA.v1

Worthington, A. C. (2002). The impact of student perceptions and characteristics on teaching evaluations: A case study in finance education. Assessment and Evaluation in Higher Education, 27(1), 49–64. https://doi.org/10.1080/02602930120105054

Print Friendly, PDF & Email