How to evaluate teaching

By Margaret Adolphus

What is evaluation of teaching and why is it needed?

Sources of evaluation

Evaluation can be sought from a number of different sources, notably:

Students, usually by questionnaire and often supplemented by interview and focus groups. Students are uniquely qualified to offer a "learner's eye view" (Hounsell, 2003). This form of evaluation is usually referred to as student evaluation of teaching (SET).
Other teachers, for example through peer observation of a class.
The teacher him or herself reflecting on his or her own work, perhaps writing a statement on a particular module, or keeping a diary to note particular teaching episodes as their relevance strikes.

Evaluation can be divided into two camps: summative and formative. Note that the examples below are mainly for an institution's internal purposes: it should not be forgotten that in an age of quality assurance, funding bodies may need to see an evaluation system in practice as an assurance that standards are being maintained. For a more detailed discussion on this, see: "International business schools and the search for quality".

Summative evaluation

Summative evaluation usually comprises a post-course questionnaire looking at student satisfaction. Its major use is to evaluate and compare faculty, and make decisions about rank, salary and tenure.

This type of evaluation, however, has received criticism on a number of grounds. It measures students' satisfaction with the course and the tutor, rather than their learning. In addition, teachers find themselves judged mainly on one aspect of their teaching, their class presentation and instructional delivery, rather than their course planning and design, supervision of research, visits to students on placement, mentoring, etc. For that reason, many argue (for example, Hounsell, 2003) that evaluation needs to be approached with a "wide-angle lens", to capture all of teaching's complex facets.

Formative evaluation

Formative evaluation may be better at giving this broader perspective. It is usually done mid-course, with detailed questions designed to shed light on pedagogy and teaching strategy. It may then be repeated to see whether any resulting changes have had an effect. Its objective is not to judge the worth of the faculty member, but to provide a developmental appraisal of their teaching skills. Any resulting issues can therefore be tackled at both an institution and individual level by courses or mentoring.

Publisher's note

The author is grateful to Dr Martin Oliver, Reader in Education (ICT) at the Institute of Education, University of London, for information with regard to the PhD student using the biographical narrative interpretive method in evaluation.

What does the research say about evaluation and what are the implications for practice?

There has been a considerable amount of research on the effectiveness of student evaluations.

Overview of the research

Nuhfer (2003), in his position paper for the Center for Teaching and Learning at Idaho State University, comments that there is a demonstrable link between student satisfaction and achievement. He also mentions that large class size negatively affects student evaluations, and that motivated students are more likely to rate a course highly.

Pounder (2007) provides a useful literature review of the "myriad studies" of evaluation. He shows that evaluation is affected by many different factors, such as:

the age of the student (mature students are likely to be more lenient),
the gender of the teacher (women can be discriminated against),
teacher personality,
course content, and
what grade the teacher awarded the student.

Pounder concludes that there are too many variables for one measure to be an accurate assessment, also that there is no correlation between the standard SET and student achievement. This does not mean, however, that one should reject student evaluation out of hand, but that one should supplement traditional methods with other approaches.

An holistic approach

The framework presented here suggests that in the case of the SET process in its conventional form, its value is questionable as the sole measure of classroom performance since the quality, richness and diversity of what happens in the typical classroom cannot be captured by the SET process alone.

However, in the field of education, measures of classroom effectiveness are essential, despite the deficiencies of the conventional SET approach. There are therefore strong grounds for arguing that educational organizations can and should experiment with and develop approaches to assessing classroom dynamics that break from the conventional SET mould. Educational organizations might then be in the position to supplement the conventional SET with other approaches that have the potential to provide a richer picture, and more equitable assessment, of what happens in the classroom (Pounder, 2007: pp. 186-187).

Nuhfer (2003) also argues for a more holistic approach to evaluation and the inclusion of formative methods. He is sceptical about the value of some types of summative evaluation, which use global questions such as, "Overall, how do you rate this instructor's teaching ability compared to all other college instructors' teaching ability ... " on the grounds that the measurement is of student satisfaction, rather than learning. Such questions may be open to administrative abuse as teaching competence is deduced and tenure and promotion decisions made.

Gender bias

There is also widespread concern that standard evaluations may reflect negatively on women faculty. While this may be difficult to substantiate, and many researchers would deny any bias, there is concern that research does not look at contextual information, for example at the way women faculty may be assigned to larger classes.

Fox (2008) quotes a study by Basow (1998) in which the author gathered substantial quantitative data about the position of male and female staff members at her institution, and found that female teachers were at a disadvantage in terms of status and position. When the ratings favoured male lecturers, this doubled their disadvantage in that they were less likely to get tenure or promotion.

What does the research tell us about evaluation?

From the research it is possible to deduce a number of principles of effective practice for evaluation:

No one instrument on its own can account for the multidimensional aspect of teaching and learning. It is necessary to find information from more than one source, and to triangulate with a number of different methods (Abrami, 1993 and Marsh 1995, quoted by the Center for Excellence in Teaching and Learning, 2010; Nuhfer, 2003; and Pounder, 2007). According to Fox (2008), " ... the reason that many studies of student evaluations of teaching so far have produced conflicting results is that too few of them account for multiple dimensions of teaching and learning".
Formative as well as summative evaluation should be used. Nuhfer (2003) quotes studies that show formative evaluation, with a follow-up consultation with the students, is more effective than summative evaluation on its own. Just obtaining quantitative information and analysing the numbers will not give sufficient clarity or depth.
Know at the start what you are evaluating. According to Johnstone (2005: p. 2), "the purpose of the teaching innovation and the expected outcomes in terms of student learning and attitude changes must be specified". And as no one instrument is suitable for all evaluation purposes (Nuhfer, 2003), design the tool carefully to measure what you have decided is important.
Evaluation should focus not merely on individual teachers and their performance, but also on improving teaching and learning generally (Center for Excellence in Teaching and Learning, 2010).
Avoid subjective questions about the personality or charisma of the teacher, and focus on educational attainment (Nuhfer, 2003).
A range of variables can affect evaluation, such as student motivation to take the course, grade anticipation, size of class, etc., and, arguably, the gender of the instructor (Fox, 2008; Pounder, 2007).
Instructors receive different ratings on different courses, therefore reliable judgements cannot be made on the basis of one course (Cashin, 1988, quoted in Nuhfer, 2003).
There needs to be a common understanding between faculty and administrators about the various purposes of evaluations, and mechanisms put in place, such as a system of appeal, to ensure fairness and discourage abuse (Center for Excellence in Teaching and Learning, 2010).

How to evaluate

Effective evaluation depends on keeping the principles listed in the previous section in mind, knowing what you are evaluating, and designing the instrument accordingly. A fit-for-purpose instrument will be valid; one that measures consistently among students and over time will be reliable (Center for Excellence in Teaching and Learning, 2010).

No one instrument is adequate for evaluation, and you may need to triangulate using other instruments, for example questionnaires can be combined with interviews or focus groups.

What to evaluate

If you are evaluating a particular course, you need to design an instrument that reflects the totality of that course. This means not just the class element, but also course design, organization, assessment, etc. You can then group the questions according to factors.

For example, you will want to include questions about:

Delivery of taught material: does the lecturer present things in a clear, organized way? Does he or she make the material interesting?
Assignments: are these appropriately paced, or too bunched? How helpful was feedback?
How helpful were resource materials, course websites, etc.?
The overall organization of the course, for example pacing.
Access to facilities, such as computing, and library.
Interactivity and student-centred approaches, for example group work (learning from students often gets overlooked in evaluation, focusing as it does on the lecturer).
The effect of the course on students' learning.

You may not want to evaluate the whole course, but particular aspects of it, for example the use of Twitter. Or, you may want to measure the attitudes of the students themselves, for example, how does their attitude towards learning change over time? In which case, you will want several evaluations, each at different stages.

Try and focus on the taught rather than the teacher: you should be more interested in what the students learned, rather than the teacher's personality and its effect on the students. Look particularly at your department and institution's criteria for effective teaching, and use these as guidance on what to evaluate.

Consider who to ask. Students may have valid views on some issues, such as whether or not the lecturer makes effective use of class time, but they cannot be expected to know whether or not the lecturer is up to date with the latest research (Gross Davis, 1993).

There are some issues that your colleagues are in a better position to assess, such as course aims, content, and material, possible assessment methods, and new instructional methods. You may also want to ask yourself how something went, perhaps writing a reflective statement at the end of a course or keeping a diary in which you record your perceptions of a particular class interaction.

Methods of evaluation: The questionnaire

The questionnaire is not the only instrument which can be used for evaluation, however, it is the most common. Here are some points to bear in mind when designing it.

Use both open and closed questions, so that you reveal both quantitative and narrative data. The latter provides opportunity for the student to reflect and elaborate on their experiences.
For quantitative questions, avoid just using "yes/no" answers, which are very easy to fill in mindlessly. Instead, use a Likert rating scale, either 5 or 7-point, with 1 being the lowest rating and 5 or 7 the highest. These are useful for calculating an average response from the class.
You can also test the same dimension twice by using two different questions which ask similar things in different ways (Johnstone, 2005). Note how this is done in the example shown below in "Figure 1. Example of a questionnaire", taken from the University of Glasgow Centre for Science Education (MacGuire, 1991, quoted in Johnstone, 2005).
Group questions according to theme, and have a number of different questions in each.
Ask general questions about the teacher, but avoid the subjective. For example not, "Do you respect this teacher", but something along the lines of, "How high do you rate the teacher's overall effectiveness?".
Make sure the questions are clear, and avoid anything ambiguous. For example, "The instructor is well prepared and marks work fairly" confounds two issues (Gross Davis, 1993).
Make the form as short as possible. Questionnaire fatigue can easily set in.

Figure 1. Example of a questionire (from University of Glasgow Centre for Science Education.

Figure 1. Example of a questionnaire (from the University of Glasgow Centre for Science Education)

Not all questionnaires are based on the one-time experience of a course: some are instruments measuring attitude and are designed to be repeated several times.

For example, Perry's Model, which is popular in some parts of science education, looks at the way students mature from wanting to be spoonfed by the lecturer, to greater independence and questioning. It has been used to measure change from a cramming approach to one based on problem-based learning.

Longitudinal measurement is necessary to give a series of snapshots at different points in time. Students are presented with a series of statements to choose from and have to select the one they agree with (Johnstone, 2005). The questionnaire is supplemented by interviews as a way of providing richer data.

When it comes to administering the questionnaire, bear the following points in mind (Center for Excellence in Teaching and Learning, 2010; Gross Davis, 1993):

Set aside a time for students to fill in the questionnaire – perhaps 15 minutes at the end of the final session, or perhaps the week before the final session.
Ensure that students understand the purpose of the exercise – that it is part of your attempts at continuous course improvement.
Assure them of anonymity.
Get someone who is not the faculty member taking the course to collect up the questionnaires, and take them to the faculty office.
Do not look at the forms until you have finished grading the course.

When carrying out the analysis of the questionnaires, the following are useful guidelines (Center for Excellence in Teaching and Learning, 2010; Gross Davis, 1993):

Establish a unit of analysis, such as class average for a response to a particular question.
Keep courses separate: aggregating data will confuse trends.
Check the number of students who completed the forms against the class enrolment. Be cautious about a low completion rate, and do not attempt to summarize data if there are fewer than ten forms.
Prepare summary statistics for the quantifiable questions: frequency distribution, average response, standard deviation, departmental or other norm for comparison.
Summarize narrative comments for each question. Group the summary under headings, noting the number of comments under each heading. Bear in mind course aims and objectives and departmental goals.
For quantifiable questions, note your highest and lowest rated items. Do they reveal strengths and weaknesses which cluster in patterns, say on organization of material?
From the narratives, identify particular problems. Are complaints justified?
Look at factors that could influence the course – for example is it a large or small class? Is it one you are used to teaching?
Try and obtain the help of an experienced colleague, or perhaps someone in a teaching support unit, to go through the questionnaires with you.

Other methods of evaluation

Other ways of obtaining feedback on teaching include more "conversational", qualitative methods. For example, interviews with a small percentage from the questionnaire population or structured focus groups are often used to supplement questionnaires, which can yield rather bland data.

Narrative methods, too, are becoming popular. For example, a PhD student at London University's Institute of Education used a biographical narrative interpretive method to obtain rich accounts of student experience on an online course. She found it particularly effective for probing students who were reluctant to engage.

One student, for example, described how he did his coursework after he had finished his bar shift at 1am in the morning, as it was the only time he could gain access to a computer. Another, a refugee, used her first pay cheque to buy a computer and then had to negotiate access with her family. This is not the sort of data that you can easily obtain from a questionnaire.

There is only so much information that can be gleaned from students: one's peers also provide a useful source. The standard form of peer evaluation is the observed lesson, and observation has been built into quality assurance practices, for example in the UK. However, as with other evaluation methods it is most valuable when it is developmental, and is particularly useful for mentoring someone new to teaching.

The observer should have a checklist for what to look for, which may for a lecture include the clarity of the session, the aims and objectives, the delivery, the engagement of students in the learning, and opportunities for interaction. For a seminar or small group activity, the list should include facilitation skills, interaction, encouraging all students to participate, feedback, and helping students with their learning goals (Fullerton, 2003).

Teachers can evaluate themselves through reflecting on the quality of their teaching, either as a whole or as a result of a particular class or interview with a student.

Some institutions or professional bodies may require teachers to submit a teaching portfolio. This is a collection of documents which provides evidence of the work done and skills developed in teaching. The following are some examples of what it could include:

Student ratings and peer ratings or observations.
Examples of courses developed or re-designed.
Instructional materials, course textbooks, etc.
Examples of innovative teaching.
Pedagogical research.

More information on teaching portfolios can be found in Fry and Ketteridge (2003).

Conclusion

Evaluation can help towards the development of excellent teaching and a better student experience.

The key to good evaluation is to treat it like you would a research project: focus, know what you want, choose a methodology which is in keeping with your purpose, draw out findings which are legitimate, and consider their implications.

Traditional evaluation sheets issued post course tell us how satisfied students were, but may not reveal much about what they learned. Good evaluation should do just that. It should also reveal more than who should be promoted, or offered tenure, but also what needs to be done by the institution as a whole to support teaching.

References

Center for Excellence in Learning and Teaching (2010), "Student evaluation of teaching: Guidelines and recommendations for effective practice", Center for Excellence in Teaching and Learning, Iowa State University, available at: www.celt.iastate.edu/set/effective.html [accessed 29 April 2010].

Fox, R. (2008), "A feminist examination of studies into student evaluations of teaching: moving our voices to the center", thirdspace: a journal of feminist theory & culture, Vol. 8 No. 1, available at: www.thirdspace.ca/journal/article/view/fox/216 [accessed 29 April 2010].

Fry, H. and Ketteridge, S. (2003), "Teaching portfolios", in Fry, H., Ketteridge, S. and Marshall, S. (Eds), A Handbook for Teaching and Learning in Higher Education, Kogan Page, London, pp. 242-252.

Fullerton, H. (2003), "Observation of teaching", in Fry, H., Ketteridge, S. and Marshall, S. (Eds), A Handbook for Teaching and Learning in Higher Education, Kogan Page, London, pp. 226-241.

Gross Davis, B. (1993), "Student rating forms", Tools for Teaching, Jossey-Bass, San Francisco, available at: http://teaching.berkeley.edu/bgd/ratingforms.html [accessed 29 April 2010].

Hounsell, D. (2003), "The evaluation of teaching", in Fry, H., Ketteridge, S. and Marshall, S. (Eds), A Handbook for Teaching and Learning in Higher Education, Kogan Page, London, pp. 200-212.

Johnstone, A. (2005), Evaluation of Teaching: A Physical Sciences Practice Guide, Physical Sciences Centre, Department of Chemistry, University of Hull, available at: www.heacademy.ac.uk/assets/ps/documents/practice_guides/practice_guides… [accessed 29 April 2010].

Nuhfer, E.B. (2003), “Of what value are student evaluations?”, Center for Teaching and Learning, Idaho State University.

Perillo, L. (2000), "Why I stopped reading my student evaluations", Chronicle of Higher Education, 7 July.

Pounder, J. (2007), "Is student evaluation of teaching worthwhile? An analytical framework for answering the question", Quality Assurance in Education, Vol. 15 No. 2, pp. 178-191.