Home-Pages - Tips - open only this page -  into left frame 

Accurate Measurements of Procedural Knowledge


Many educators, including me, place a high value on thinking skills.  But this noble ideal is rarely converted into significant action, with curriculum & instruction that is widely adopted, partly due to a major practical problem:

It is extremely difficult to develop assessments of higher-level thinking skills that measure knowledge accurately (with a strong correlation between a student's exam score and their level of skill), measure appropriate knowledge (by testing ideas-and-skills that are the educational goals), and differentiate between levels of knowledge (by including tasks that vary in difficulty, with some that most students can do, some only a few can do, and some in between).

It's also difficult to achieve these assessment goals with exams that can be written & graded in a reasonable amount of time.  Writing a “thinking skills” exam is difficult and it requires lots of time.  So does grading this type of exam.  And this type of grading* requires subjective judgments (like on essay exams, but often more difficult) that many teachers don't enjoy.    {* as in grading difficult “thinking skills problems” or complex Design Projects }


Therefore, usually thinking skills are not tested in high-stakes exams (as in "No Child Left Behind"), and these exams are the basis for several rational reasons to not teach thinking skills`.


The section below was imported from Aesop's Activities: Effective Teaching Strategies for Goal-Directed Education.  Later it will be revised for use here.


Evaluation Activities — Why, What, and How?

In most parts of the world (North America, Europe, Asia,...) education includes assessments, with student performances being evaluated in the form of course grades, based on evaluation activities that typically include assignments and exams.  And sometimes (in Europe & Asia more than in the United States) the educational opportunities and professional options of students are determined by performance on high-stakes exams.  Here are some thoughts about evaluation:



1) motivation:  A high score on an exam, or any other evaluation activity, is an extrinsic reward that will motivate students who want a good grade, although students also will study for other reasons: intrinsic, personal, and interpersonal.

2) experience:  An evaluation activity is an opportunity to gain experience with ideas and skills, so it's just a special type of thinking activity.

3) guidance:  If students are studying "for the exam" we can guide their studying by telling them what will be on the exam; in a well-designed course there is a close match between what is desired (the ideas-and-skills that are the educational goals) and what is evaluated.

4) feedback for learning:  Students can continue their old strategies for learning with confidence (if they do well on an exam) or make appropriate changes (if they do poorly) in an effort to learn more effectively.

5) feedback for teaching:  Similarly, strategies for teaching can be affected by feedback from exams when a teacher asks, "Are my instructional methods effective in helping students learn?" *

6) evaluation of students:  Most schools require grades for students, and most teacher assign grades based on the results of evaluation activities.

* For example, if an exam shows that too many students cannot solve problems requiring a mastery of some ideas-and-skills you have been “teaching”, you should ask “how can we revise our instruction so it will help students learn more effectively?”    { How many is "too many"?  Usually you want some exam questions that many students cannot answer, to avoid a ceiling effect.  But if the number of failures is unexpectedly large, after the difficulty of a question has been considered, this may indicate a weakness of instruction that should inspire a revision of instruction. }



Usually it's easy to construct (and grade) an exam that tests lower-level knowledge, such as a student's ability to recall facts or solve familiar problems by applying a known method.  It's much more difficult to construct and grade exams (*) that accurately measure higher-level thinking skills, by observing how well a student responds to challenges like a novel problem requiring creative improvisation, or testing the quality of their thinking in a complex situation that measures their ability to make evaluations based on multiple goal-criteria that cannot all be maximized so trade-offs (with a weighing of relative advantages) are necessary, or where students analyze a situation in which conflicting causal factors are operating.

But if one of our goals is to help students learn higher-level thinking skills, then making exams that test these skills can be a worthwhile investment of time and effort that will be rewarded with improved education.

* It's difficult on a small scale for one semester in one class.  And the time-and-effort multiplies when multiple exams are necessary.  This occurs when a class will be repeated and a new exam must be constructed every semester (to prevent a “novel problem” from becoming a familiar problem) or for a sequence of classes taught by different teachers, [as in a Wide Spiral Curriculum` to teach thinking skills].  For a large class the total time is more, but the time-per-student is typically less.



Usually students work individually on an evaluation activity, but they can also work as a collaborative group;  with group testing, questions of fairness should be considered by asking whether some students have contributed more to the quality of a project than others, and how much “luck of the draw”(in getting partners for a group project) is involved in assigning group grades.

A test can be in-class or take-home, written (with multiple-choice or short-answer questions, problem solving, essay writing,...), oral (by answering questions, asking questions, discussing issues, evaluating policies, solving problems,...), or physical (for example, by performing a laboratory procedure).  Or a teacher can observe the quality of work (in a lab) or the quantity and quality of expressed ideas (in a discussion), or have students do a long-term project, or...


An effective exam should:

measure knowledge accurately (there should be a high correlation between a student's exam score and their level of understanding-and-skill);

measure appropriate knowledge (by testing ideas-and-skills that are the educational goals, and in a well designed course have been the focus of teaching and learning);

differentiate between levels of knowledge (by including tasks that vary in difficulty, with some that most students can do, some only a few can do, and some in between, thus avoiding a ceiling effect or floor effect where everyone does equally well or poorly).

Achieving these goals is not easy, but the potential rewards make it a challenge worth pursuing.



I.O.U. Later, maybe in late-2022, this page will have more content, especially with commentary about the first & last paragraphs, "Many educators,..." and "Therefore, a testing..."