Teaching Expertise in Context: How to Evaluate Teacher’s Situated Cognition?

Higher Education Evaluation and Development

*Please note this commentary is not the final published version. The Version of Record will be published in HEED Issue 16.1 in 2022*


Teaching Expertise in Context: How to Evaluate Teacher’s Situated Cognition?

Caspari-Sadeghi, Sima (University of Passau, Passau, Germany)

1.    Introduction.

Although teaching is one of the most long-standing human professions, there is a clear lack of agreed upon conception of what it means to be an expert teacher. This does not prove that researchers have not done any research up to now. In fact, the issues of identifying, evaluating, training and preserving qualified or expert teachers have always been the concern of both research and practice in education. For example, as a nationwide movement to reform American public education, No Child Left Behind Act (2001), mandated that a highly qualified teacher be in all classrooms. To accomplish such a laudable goal, there is an urge to define a "highly qualified/expert teacher", which turned out to be unusually difficult (Berliner, 2005). Without a clear and specified definition of a construct, it’s also not possible to operationalize and measure it objectively. 

The study of expertise mandates two things (a) finding experts, and (b) defining 'tasks' which are representative of the domain and performing on them can clearly distinguish between experts and non-expert. There are well-defined domains, i.e. chess, sport, music, piloting, surgery, which have clear objective criteria or standards to find experts (winning games, successful flights, accurate diagnosis) as well as representative tasks to exhibit superior performance (choosing the 'best next move in the middle-game' in chess). However, teaching and some other professions, i.e. stock-judgment, psychotherapy, management, pertain to 'ill-structured domains': firstly, it's difficult to identify real experts, secondly there's no specific, well-structured task which can capture fine distinctions among practitioners at different levels of performance (Ericsson & Lehmann, 1996). 

Teaching is essentially a contextual activity in which teachers engage in a continuous loop of evidence-inference-action-monitoring process. Therefore, this study suggests Cognitive System Engineering (CSE) as an alternative approach which can elicit Data-driven Decision-Making (DDDM) skills of teachers. 

2.    In Search of the Expert Teacher.

We summarize the common criteria (Caspari & König, 2018) to measure expertise in teaching as (a) experience, (b) nomination, (c) value-added, and (d) performance-based criteria. Below, it’s discussed that although these criteria are practical indicators, none of them alone could reliably and validly capture expertise in teaching. 

2.1.     Experience. The first standard to measure expertise is "experience", it's quite common in education to use experienced and expert teacher interchangeably (Berliner, 2005; Caspari-Sadeghi & König, 2018). Although acquiring expertise in any area of human activity, from sport to science, requires practice over a long period of time, experience alone does not guarantee the development of expertise. For instance, teachers' experience is found to have a positive relationship with learners' achievement just up to the first 5 years, but the correlation becomes small and modest later on (Rockoff, 2004). 

2.2.     Social recognition. Another prevalent criterion is “Nomination” by colleagues. A common way to identify an expert teacher is to ask heads of school, principals or colleagues to nominate them. However, people recognized by their peers as expert don't always display superior performance on domain-related tasks. Sometimes, they are no better than novice even on tasks that are central to expertise (Ericsson & Lehmann, 1996). The distinction between "perceived expert" and "actual expert" should be demonstrated and measured objectively (McClosky, 1990).

2.3.     Student survey. Although it could be a rich source of feedback about teacher performance, relying solely on students' opinion/attitude/experience, collected through survey or questionnaire, is an incomplete and questionable approach. The major problem is the "pseudo-expertise" one develops as a student. Before anyone starts their formal training as a teacher, they spend over 10,000 hours as students in the classroom, making teaching the profession with the longest apprenticeship of any. One consequence is everyone in our society, including teachers, think they already know what an expert teacher is. However, judging quality of performance in other domains requires years of systematic instruction, practice, and accreditation, whereas students receive no systematic training about how to reliably and validly observe and rate their teachers (Stigler & Miller, 2018). Furthermore, it’s not clear if students can clearly distinguish between rating ‘teacher quality’ (individual characteristics and behavior) and ‘teaching quality’ (instructional strategies and practices). 

2.4.    Value-added Measure. A very popular approach to measure teaching expertise is to calculate the difference between the prior achievement and the students' achievement on the year-end standardized tests. Any added value to the gain score will be then attributed to the teacher. However, there are some shortcomings to this approach (Holloway-Libbell & Amrein-Beardsley, 2015). Firstly, the impact of a teacher cannot be accurately isolated from other variables, i.e. student motivation/attendance/effort, parental involvement/educational/economic level, home tutoring, etc. For instance, it is claimed that 90% of the variation in student gain scores are not under the control of the teacher, rather it is due to the student-level factors (Schochet and Chiang, 2010). Secondly, the studies that correlate value-added scores with teacher characteristics show mixed results, with many teacher’s variables lack strong predictive power (Geo, 2007). For example, teachers who produce the strongest gains on achievement tests are not the one who succeed at reducing absence and suspensions, variables shown to predict future professional achievements of students (Jackson, et al., 2014). Measurement of Effective Teaching (MET) project, sponsored by the Bill and Melinda Gates Foundation, is an ambitious empirical study of the relationship between teacher's performance and students' test scores by collecting more than 20,000 videotaped lessons of 3,000 teachers in the U.S. Though, there seemed to be some teachers who were more effective than others in producing better score on tests, disappointingly, the observational measures applied to the videos of classroom teaching yielded very little of note about a direct relationship between one type of teacher activity and variance in students' learning (Kane & Staiger, 2012). 

2.5.     Performance-based criteria. The last criterion is “performance-based”, which could be of two types (a) measuring teachers' knowledge via standardized tests, and (b) measuring teachers' effectiveness in the classroom through observation check-lists or video-recording. Though informative, both approaches have their own limitations. Standardized knowledge tests which measure different aspects of teacher knowledge (Shulman, 1987) such as content knowledge, pedagogical content knowledge, and general pedagogical knowledge, mostly focus on 'inert knowledge': measuring memory-based, de-contextualized aspects of declarative/explicit knowledge of teachers, rather than procedural knowledge or situation-specific skills of perception, interpretation, and decision-making (Depaepe et al., 2013). The main problem with such paper- knowledge-tests is the key to effective teaching is not whether you know something (declarative knowledge), but whether you are able to access and apply the knowledge (procedural knowledge) when you need it to improve students' learning opportunities. Additionally, paper tests lack ecological validity: test items, mostly brief and simple, cannot represent actual complex tasks that a teacher should perform in a dynamic, multifaceted real situation (Larrabee & Crook, 1996) Observing and recording teaching performance and trying to find some evidence of superior, best practice which can define an expert teacher turned out to be difficult as well. Video-recorded data gathered from Third International Mathematics & Science Study (TMSS) found no consistent effects of the teacher practice or characteristics on student achievement, except for problem-solving (Akyüz and Berberoglu, 2010). Findings indicated striking homogeneity of teaching practice within high-achieving countries, but marked differences in practices across countries. For example, because Japan is a top-ranked country in math, one might expect that Japanese teaching routines would be similar to those used in other high-achieving countries, such as Switzerland, Hong Kong, or the Netherlands, which was not the case (Stigler & Hiebert, 2004). The reason could be teaching expertise cannot be defined in terms of either selecting 'decontextualized best practice' on a test or performing it in a course, since actual expertise lies in constantly reading the situation, monitoring progress or problems, and making necessary adjustments and decisions in real-time (Stigler & Miller, 2018).

3.    Towards an Alternative Approach

Studies in psychology of Expertise and Expert Systems in Artificial Intelligence suggest a shift in ‘perception, reasoning, and decision-making’ is responsible for moving from novice to expert across domains of professions. We hypothesize the same shift is involved in becoming an expert teacher. 

Perception (Pattern recognition) is the ability to rapidly apprehend underlying causal variables, meaningful similarities and abnormalities in the context. It facilitates recognizing the type of problem and its level of difficulty (Landy, 2018). This leads to an efficient reasoning/judgement (assessing alternative solutions or courses of actions which fits the best). Studies have shown decisions are made differently at different stages of expertise: experts rely on their schemata, a mental representation of already encountered cases stored in their long-term memory which can be activated by perceptual/situational cues, and leads to a process called 'Recognition-primed decision-making' (Lintern et al., 2018). While novices, being totally overwhelmed and distracted by irrelevant, superficial cues and failing to recognize the main problem, rely on their working memory which leads to imposing already learned explicit instructional theories (declarative knowledge) on the problem-solving process (Ward et al., 2011).

To illustrate it better (see Figure 1), we draw on the architecture of Expert Systems, programs designed to emulate and operate at the level of human expertise. These systems have two key components: knowledge base and inference engine (reasoning). The inference engine is the machinery that applies that knowledge to the task at hand. The knowledge base of an expert system contains both factual (know-that) and heuristic (know-how) knowledge (Davis et al., 1993). Factual knowledge can be measured via paper-and-pencil tests. However, the major part of a skilled performance is due to heuristic knowledge, which is experiential, procedural, more judgmental and tacit. These aspects of expertise, namely heuristic and inference engine, can be measured by methods of Cognitive Systems Engineering (CSE).

Figure 1. Expert Knowledge Components
Figure 1. Expert Knowledge Components

4.    Cognitive Systems Engineering (CSE)

Cognitive System Engineering (CSE) can be employed to elicit and represent heuristic and tacit knowledge. CSE is a professional discipline, emerged from traditional Human Factors, which serves to guide analysis, modeling, design, and evaluation of complex sociotechnical systems so that the cognitive work involved will be more efficient and robust (Hollnagel, & Woods, 2005). It offers methods for knowledge elicitation and knowledge representation by identifying the cognitive relevant structures and process involved in performing a task and how they are related to each other. The ultimate target designs include software and hardware, training systems, organizations, and workplaces. CSE was first used in the aftermath of Three Mile Island accident (1979) as a practical, diagnostic tool in engineering; later it proved a success record in several areas: nuclear power operator, fire commanders, neonatal intensive care center, medicine and autonomous air vehicle (Dominguez et al., 2015; Moon & Hoffman, 2014; Wood & Roth, 1986). 

Cognitive Task Analysis (CTA), a branch of CSE, is based on compelling evidence that experts are not fully aware of about 70% of their own decisions or mental processes, and therefore, unable to explain them effectively (Clark et al., 2008). It involves a variety of well-specified techniques to elicit and describe the knowledge (declarative, know-that and procedural, know-how), skills, cognitive styles/process and learning hierarchies involved in solving a given task (Crandal et al., 2006). CTA uses a variety of techniques, i.e. Concept Mapping, Think-aloud Protocol Analysis, Critical-incident Analysis, Concept, Process and Principles, etc. The following depicts a knowledge elicitation technique called Critical Decision Method (Smith & Hoffman, 2018). 

4.1.    Knowledge elicitation via Critical Decision Method (CDM)

Critical Decision Method (CDM) uses a retrospective, case-based approach to gather information about perception/pattern-recognition and decision-making skills at different levels of expertise. It invites the participants to recount a recently experienced "tough case" that involved making a difficult decision that challenged their expertise. Probe questions focus on the recall of specific, lived experience. First, the participant provides an unstructured account of the incident, from which a timeline is created. Next, the analyst and the participant identify specific points in the chronology at which decisions were made. The decision points are then probed further using questions that elicit details about significant cognitive process and states: (a) perceptual cues or situational awareness used in making the decision, (b) prior knowledge or skill that were applied, (c) the goal considered, (d) decision alternatives and why they have not being considered (Hoffmann, 2012). 

In micro-context of classroom, Data Driven Decision-Making (DDDM) refers to continuous use of data to plan, implement, monitor and re-adjust action. Expert teacher uses relevant evidence from various sources, i.e. observation, questioning, comments, discussions, tests, exams, quiz, assignments, performance tasks, portfolio, projects, etc. to identify gap in understanding, lack of background knowledge, misconception, misunderstanding, and flexibly adapt the instruction to the needs, preferences and ‘momentary contingency’ (Black & Wlliam, 2018; Mandinach, & Jackson, 2012).

5.    Conclusion and Implications 

This paper discussed the insufficiency of available criteria, i.e. experience, test, nomination, etc., to measure teaching expertise. CSE is suggested as a knowledge elicitation method to uncover elements of expert reasoning such as decision types, decision strategies, decision requirements, information triggers and hidden assumptions (Crandall et al., 2006). CSE assumes teacher’s use of DDDM at the classroom level by attending to the data on student learning and using the relevant evidence to continuously guide progress, monitor achievement and modify teaching to the contextual variables.
Once we know about the underlying mediating mechanism of how experts organize their knowledge and utilize it to make a superior performance (decision-making), it is possible to improve the efficiency of learning by designing better developmental environments to increase the proportion of performers who reach a higher level of expert performance (Ericsson et al. 2018). 

Studies in other professions have shown training at higher levels requires different methods of instruction, i.e. simulation, scenario, problem-solving and decision-making exercise. Currently, there exists scarcely any Instructional Design (ID) which is based on empirically deduced knowledge and skills of Expert teachers. The outcome of CSE experiments can be employed to design 'Expert Performance-based Training' program (ExPerT). Complex domains, such as military, piloting, sport and medicine, have already introduced the so-called and reported immense success.



Akyüz, G., & Berberoglu, G. (2010). Teacher and classroom characteristics and their relations to mathematics achievement of the students in the TIMMS. New Horizons in Education, 58(1), 77–95.

Berliner, D. C. (2005). The near impossibility of testing for teacher quality. Journal of Teacher Education, 56, 205-2013.

Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education: Principles, Policy & Practice, 25(6), 551–575.

Caspari-sadeghi, S., König, J. (2018). On the Adequacy of Expert Teacher: from Practical Convenience to Psychological Reality. International Journal of Higher Education.

Clark, R.E., Feldon, D., van Merriënboer, J., Yates, K., & Early, S. (2008). Cognitive task analysis. In Handbook of research on educational communications and technology 3, (pp. 577–593). Mahwah, NJ, US: Lawrence Erlbaum Associates.

Crandall, B., Klein, G., & Hoffman, R. R. (2006). Working minds: A practitioner’s guide to Cognitive Task Analysis. Cambridge, MA: MIT Press.

Davis, R., Shrobe, H., & Szolovits, P. (1993). What Is a Knowledge Representation? AI Magazine, 14, 17-33.

Depaepe, F., Verschaffel, L., & Kelchtermans, G. (2013). Pedagogical content knowledge: A systematic review of the way in which the concept has pervaded mathematics educational research. Teaching and Teacher Education, 34, 12-25.

Dominguez, C., Strouse, R., Papautsky, E. L., & Moon, B. (2015). Cognitive design of an application enabling remote bases to receive unmanned helicopter resupply. Journal of Human- Robot Interaction. 4, 50-60. 

Ericsson, K. A., & Lehmann, A.C. (1996). Expert and exceptional performance: evidence of maximal adaptation to task constraints. Annual Review of Psychology, 47, 273-305.

Ericsson, K. A., Hoffman, R. R., Kozbelt, A., & Williams, A. M. (2018). The Cambridge handbook of expertise and expert performance (2nd Ed.). New York: Cambridge University Press.

Geo, L. (2007). The link between teacher quality and student outcomes: A research synthesis​. National Comprehensive Center for Teacher Quality. 

Hoffmann, R. R. (2008). Human factors contributions to knowledge elicitation human factors. The Journal of the Human Factors and Ergonomics Society, 50, 481-481. 

Hollnagel, E. & Woods, D. D. (2005). Joint cognitive systems: Foundations of cognitive systems engineering. Boca Raton, FL, USA: Taylor & Francis.

Holloway-Libbell, J., & Amrein-Beardsley, A. (2015). ‘Truths’ devoid of empirical proof: Underlying assumptions surrounding value-added models in teacher evaluation. Teachers College Record, 18008.

Jackson, C. K., Rockoff, J. E., & Staiger, D. O. (2014). Teacher effects and teacher-related policies. Annual Review of Economics, 6, 801-825.

Kane, T. J., & Staiger, D. O. (2012). Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains. Seattle, WA: MET Project of Bill and Melinda Gates Foundation.

Landy, D. (2018). Perception in expertise. In K. A. Ericsson, R. R. Hoffman, A. Kozbelt, & A. M. Williams (Eds.), Cambridge handbooks in psychology. The Cambridge handbook of expertise and expert performance (p. 151–164). Cambridge University Press.

Larrabee, G. J., Crook, T. H. (1996). The ecological validity of memory testing procedures: Developments in the assessment of everyday memory. In: Sbordone, R.J., Long, C.J. (eds.) Ecological validity of neuropsychological testing, (pp. 225–242). GR Press/St. Lucie Press, Delray Beach, FL.

Lintern, G., Moon, B., Klein, G., & Hoffman, R.R. (2018). Eliciting and representing the knowledge of experts. In K. A. Ericsson, R. R. Hoffman, A. Kozbelt, & A. M. Williams (Eds.), Cambridge handbooks in psychology. The Cambridge handbook of expertise and expert performance (p. 151–164). Cambridge University Press.

Mandinach, E. B., & Jackson, S. S. (2012). Transforming teaching and learning through data driven decision making. Corwin Press.

McClosky, D. N. (1990). If you're so smart: The narrative of economic expertise. Chicago, University of Chicago Press. 

Moon, B., Hoffman, R.R., Lacroix, M., Fry, E., & Miller, A. (2014). Exploring macro cognitive healthcare work: Discovering seeds for design guidelines for clinical decision support. In T. Ahram, W. Karwowski and T. Marek, Proceedings of the 5th International Conference on Applied Human Factors and Ergonomics.

Rockoff, J. E. (2004). The impact of individual teachers on student achievement: Evidence from panel data. American Economic Review, 94, 247-252.

Schochet, P. Z., & Chiang, H. S. (2010). Error rates in measuring teacher and school performance based on student test score gains. U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance.

Shulman, L. (1987). Knowledge and teaching: Foundations of the new reform. Harvard Educational Review, 57, 1-23.

Smith, P., & Hoffman, R.R. (2018). Cognitive Systems Engineering: The future for a changing world. Boca Raton, FL: Taylor & Francis.

Stigler, J. W. & Miller, K. F. (2018). Expertise and expert performance in teaching. . In K. A. Ericsson, R. R. Hoffman, A. Kozbelt, & A. M. Williams (Eds.), Cambridge handbooks in psychology. The Cambridge handbook of expertise and expert performance. Cambridge University Press.

Stigler, J. W. & Hiebert, J. (2004). Improving mathematics teaching. Educational Leadership, 61, 12-17. 

Ward, P., Suss, J., Eccles, D. W., Williams, A. M., & Harris, K. R. (2011). Skill-based differences in option generation in a complex task: A verbal protocol analysis. Cognitive Processing: International Quarterly of Cognitive Science, 12, 289-300.

Woods, D. D., & Roth, E. M. (1986). Models of cognitive behavior in nuclear power plant personnel. Washington DC: U.S Nuclear Regulatory Commission.