Will grading easier result in higher course evaluation ratings?

By Ted Murcray

In a nutshell

Short answer: Maybe, but not likely.  A recent study found that easing up on grading was not associated with a significant increase in SET scores. Increasing clarity of expectations for the course is a more solid path to increasing course evaluation scores.

Longer answer: When studied as an independent variable, grades were not predictive of SET scores.  Instructors who are “easy” graders were less likely to have strong correlations between their grades and their SET scores, indicating that you can’t “buy” high SET scores by giving students better grades as some researchers have suggested (Marsh, 1984). In fact, only 4.8% of the variance in SET scores can be explained by grading reliability, which means final grades in a class have very little to do with how students rate the class on course evaluations.

The same study then compared five items from the SET to grading practices of the instructors and found that high ratings in areas of the course and instructor clarity explained 77% of the variation in SET scores.  In other words, instructors can grade easily to get a little bump in course evaluation scores, but if they want big changes in scores, they will need to work on the level of clarity found in the course.

How do we know if someone is an easy grader or a hard grader?

In 2018, Millet studied the grading practices of instructors in higher education by comparing the grades that an instructor assigned in one section of a class to the students’ overall GPA.  If an instructor assigned a grade that was consistent with the students’ GPA, then that instructor was considered reliable.  Lenient graders assigned grades that were higher than the students’ GPA, and tough graders assigned grades that were lower than the students’ GPA.  To recap:

Grading reliability: When the grade the student receives in the course matches the student’s GPA, the grade would be considered reliable because it is consistent with other data points.  This study didn’t measure reliability at the student level but at the course level, so student grades were averaged and compared to the average GPAs of the same students.

Grading leniency: When the grade the student receives in the course is higher than the student’s GPA, the grade might be considered lenient.

Tough Grading, or Strict Grading: When the grade the student receives in the course is lower than the student’s GPA, the grading would be considered tough or strict.

What is the leniency hypothesis and where did it come from?

Millet’s (2018) study did not include SET scores, so he was not able to make any connections in his article.  However, in his discussion, he suggested that easy graders may be doing so to get higher SET scores, which is a theory posited by prior researchers, such as Marsh (1984).  Marsh suggested that instructors “buy” high SET scores by giving high grades.

Brockx, Spooren, and Mortelmans (2011) noted that those who debate whether easy grading leads to high SET scores fall into two camps: the Leniency Hypothesis or the Validity Hypothesis.

Leniency Hypothesis: lenient grading causes higher SET scores

Validity Hypothesis: strong teachers teach so well that grades go up and students feel good about their learning, which causes higher SET scores. 

Millet suggested that future research should be conducted to see which of these hypotheses is true.  He reasoned that a researcher could replicate his study on grading reliability and use the results as an independent variable to evaluate the correlations between easy grading and SET scores.

So, do easy graders get higher SET scores?

Calkins, et.al. (2022) conducted the study that Millet suggested by comparing grading reliability data with SET scores.  This study included data from a large comprehensive university with eleven (11) years’ worth of data.  They looked to see if there was a positive correlation between grades and SET scores (high grades and high SET scores go together; low grades and low SET scores go together).  Then they compared the results using grading reliability as an independent variable (Does being an easy grader correlate with high SET scores? Does being a tough grader correlate with low SET scores?)

What they found was that grading reliability is not predictive of SET scores.  Only 4.8% of the variance in SET scores can be explained by grading practices.  Instructors who were rated as lenient graders had low correlations between their grades and their SET scores.  That means even though their grades were high, the SET scores were all over the place.  Instructors who were tough graders were more likely to have strong correlation between their grades and their SET scores, but that doesn’t necessarily mean higher scores.  For example, if a tough grader gave all F’s, it is likely the SET scores would be correlated with that, which means the SET scores would also be low. 

In short, this study’s findings are not consistent with the Leniency Hypothesis.  There is not evidence that instructors can “buy” higher SET scores by giving out higher grades. 

Then, how do I raise my course evaluation scores?

Calkins and her team compared the items from the SETs administered with the grading reliability indicators and the overall SET scores.  They found that five items in the course evaluation, taken together, account for 77% of the variance in SET scores.  Those items are:

  • Objectives were clear
  • Course provided sufficient opportunity to learn
  • Grades adequately reflect the quality of my performance
  • Course challenged me intellectually
  • Course increased my knowledge

This aligns with the literature on strong course design.  Well-designed courses coupled with intentional course delivery results in higher course evaluation scores, regardless of the grades that the students earn.

Other interesting tidbits

Some other interesting findings from these two ground-breaking research studies:

  • Both studies found that instructor experience was negatively correlated with grading reliability.  This indicates that we become less consistent with our grading the longer we work in higher education.
  • Both analyses found a positive and significant influence on grading reliability by course level.  The more advanced courses are associated with greater grading reliability.

Research Questions

Interested faculty may want to continue this study.  Here are some additional questions that could be explored and might spark additional research questions.

  • What are factors that contribute to or explain the decrease in grading reliability as instructors gain experience?  Both qualitative and quantitative studies could be done to explore instructor perceptions about their own grading practices and correlated with grading reliability measures.
  • Why are instructors more consistent in grading upper-division courses? Is this a sign that they are giving more constructive feedback to majors within their program? Or is this an indication that grades become more even at the top levels because students have selected into the work?  Is something else contributing to this phenomenon?
  • How can we test the Validity Hypothesis of the relationship between SET scores and grades? Refuting the Leniency Hypothesis does not necessarily mean the Validity Hypothesis is accurate.  What could be done to determine the accuracy of that hypothesis?

If you are interested in any of these questions, feel free to grab them and begin your study!  If you feel you need assistance, please contact the TLC for more information!


Brockx, B., Spooren, P., & Mortelmans, D. (2011). Taking the grading leniency story to the edge. The influence of student, teacher, and course characteristics on student evaluations of teaching in higher education. Educational assessment, evaluation, and accountability, 23(4), 289-306.

Calkins, C., Crooker, J., & Shi, Q. (2022, April 19-27). The influence of grading reliability and grading leniency metrics on student evaluations of teaching [Conference presentation]. AERA 2022 Conference, San Diego, CA, United States.

Marsh, H. W. (1984). Students’ evaluations of university teaching: Dimensionality, reliability, validity, potential biases, and utility. Journal of educational psychology, 76(5), 707.

Millet, I. (2018). The relationship between grading leniency and grading reliability. Studies in Higher Education, 43(9), 1524-1535.

Leave a Reply

Your email address will not be published. Required fields are marked *