Metacognition - Outline
Causal Effects of Conscious Experience
Broadest Case:
• Mental states  Behavior
I want a beer, I think beer is in fridge,  I open fridge
A Narrower Case (Metacognition)
• Knowledge about cognition  control cognition
I believe I will not remember the name of the person I
just met  I may take special measures to commit her
name to memory (think that “Rita” rhymes with “pita”).
• Knowledge about cognition (a conscious experience).
– Knowledge about tasks (e.g., ‘rote vs deep encoding’)
– Knowledge about persons (e.g., ‘I’ am not good at this’)
• Control of cognitive processes. For example,
– Which strategy to use
– How much time to allocate to studying
• Monitoring of cognitive processes. For example,
Ease of Learning: “learning names is difficult”
Judgment of Learning (JoL): “I won’t remember that name”
Feeling of Knowing (FoK): “I feel that I know her name”
Confidence Judgment: “I’m not sure, it may be ‘Rita’”
Ease-ofJudgment of
Feeling of
Confidence in
Retrieved answer
In Advance Ongoing Maintenance of Self-directed Output of
Of Learning Learning
Study Time
Nelson & Narens, 1990
Judgment of Learning (JoL)
JoL: “I will I be able to remember this at a later time (at test)”
- Study this pair:
- Make a JoL:
- Test:
Captain - Carbon
“How likely that I will remember the target word
that went with “Captain”? 25% 50% 75% 100%
Captain ________
- JoL correlates with recall - if I think I know it, it’s likely I do know it
- but this correlation is far from perfect (usually less than .50)
You can also have aggregate JoL at the end of the list.
Monitoring Effectiveness: JoL & recall
• Resolution (aka discrimination accuracy or relative accuracy)
– The extent to which the subject is able to distinguish between
answers that are more likely or less likely to be correct.
• Calibration
– Whether there is overconfidence or underconfidence
How do we judge what we know and what
we don’t?
That is, what cues are used for JoL?
• A hint to this question comes from studies looking at
immediate JoL vs. delayed JoL
• Immediate JoL
– while seeing word pair (spoon – cuiller)
– Correlation with recall was low (r = .35)
• Delayed JoL:
– At the end of list
– seeing only cue (spoon - ___)
– Excellent correlation with recall (r = .90)
– 2nd graders and old adults also provide good delayed JoL
Nelson & Dunlosky, 1991
• So it seems that we (mis)use the target to assess the
cue effectiveness
• To directly test this idea
– Water – ocean
– Chicken – penguin
• (nelson & Koriat)
• Mnemonic cues: cues that give rise to the ‘feeling’ of an
item having been encoded.
• Two types of mnemonic cues:
– Intrinsic cues: Properties of the items that (subjects believe to)
influence memory (e.g., high vs. low frequency words)
– Extrinsic cues: Conditions of the task that (subjects believe to)
influence memory (e.g., repetition)
• Subjects underestimate some cues (e.g, repetition)
• Subjects incorrectly believe that low frequency words
would be harder to recognize than high frequency ones
Relation between JoL & study time
Can Judgment of Learning control how long we study?1
- Study English - French pairs: - Pen - Stylo
- Free allocation of study time
- Make a JoL:
- How likely will you recall “stylo”
when presented with ‘Pen’?
- Pen ________
1. students allocated more study time to items which were judged to be
more difficult.
2. still remembered more of the easy ones (labor-in-vain). Did not
compensate adequately for difficulty
is the causal effect of conscious experience on behavior?
(Nelson & Leonesio, 1988)
Relation between JoL & study choice
Can we use Judgment of Learning to effectively choose
what to study?
Two types of essay (within-subject):
- easy (e.g., why we need to take vitamins)
- hard (post-modern interpretations of neo-classical fiction)
Amount of study time available
- One group was given a limited amount of time
- The other group was given a great deal of time
Results: Adaptive use of study time.
When time limited: subjects focused on easy items
when time unlimited: focus on hard items
Original study with Ivy League students; replicated with Inner City
Son and Metcalfe (2000)
Public schools in NY (6th grade)
Feeling of Knowing (FoK)
1. Recall:
What is the capital of Australia? ____
2. FoK Judgment: I am __ % sure I will recognize its name
3. Recognition Is it:
B) Melbourne
C) Canberra
D) Perth
FoK accuracy usually ranges from .35 to .60
Similar range for 6-y olds & for older adults
Tip of the Tongue (ToT)
ToT: “I know this word and I will retrieve it soon”
- What is the last name of the first person to set foot on the moon?
- What is the name of the large flightless bird from Africa?
- What is the name of the religious group from Northern India whose
men where large turbans wrapped around their head?
- correlates with recall & recognition
- correlates with partial information about the target
- but these correlations are far from perfect
- ToT seems to be caused by the familiarity of the topic
and the accessibility of partial information
- The same metaphor is used in many languages
Monitoring and control processes in ‘freereport’ memory performance
• Performance in memory tasks depends on:
– Memory per se
– Memory monitoring (i.e., the subjective assessment that the
answer that comes to mind is correct)
– Memory control (e.g., decide to report or withhold the answer
based on confidence and consequences)
- “answer 4 out of 5 questions”: the student with better metacognitive
skills will do better (other things being equal).
- multiple choice: is there penalty or not?
- “memoirs” vs. “jury”: In a capital punishment case we may
withhold memories that we deem acceptable to voice in a
memoir, because in the former the potential cost of error is high
Koriat & Goldsmith, 1996
Input Question
Situational demands/
- report option
- accuracy incentive
Response criterion
Probability (Prc)
Best candidate
Pa > Pcr ?
Factors that contribute to freereport memory performance
• Overall retention (memory per se)
• Monitoring effectiveness: the extent to which the
assessed probabilities successfully differentiate correct from
incorrect candidate answers.(*)
• Control sensitivity: the extent to which withholding or
volunteering is in fact based on monitoring output
• Response criterion setting: whether the probability
threshold is set in accordance with the payoff schedule.
* Being able to distinguish what you know from what you
don’t know.
Testing the model
(Koriat & Goldsmith, 1996, exp. 1)
• General knowledge questions
– Phase 1:
• Forced-report (forced recall)
• Confidence rate
– Phase 2: (same items)
• Free report (free recall)
• Half subjects with high accuracy incentive (penalty)
• Results
– Good monitoring & control
• high correlation between accuracy in forced recall and confidence
• Very high correlation between confidence and report in phase 2
– Adaptive response criterion
• People withhold more items when penalty (but there was a
quantity/accuracy trade-off)
Testing the model
(Koriat & Goldsmith, 1996, exp. 2)
• Same method, except that
• Monitoring was manipulated with two types of items:
– Typical items (good confidence-accuracy correlation)
– Deceiving items (items people are usually sure and wrong)
• What is the capital of Australia?
• Results
– Unlike the ‘good’ monitoring condition (typical items) for the ‘bad’
monitoring condition (deceiving items):
• free-report did little to increase accuracy, and
• the accuracy-quantity tradeoff was much larger than for
typical items
• Monitoring might also be impaired:
– in special populations: Children, Korsakoff, frontal patients.
– Due to priming
Goldsmith & koriat toronto 05
Underconfidence with practice
Study times
Witness testimony
- While deliberating, jurors rely upon their
memories of the trial (availability heuristic)
- Jurors who are very confident about their
memories have the largest impact during
deliberation (Kassin & Wrightsman, 1988)
- But is that likely to be more convincing
- However, this is based on the assumption that
confidence is correlated with accuracy, is it?
Witness testimony (cont’d)
- Open ended questioning and free recall are
preferable to direct questioning and recognition
tests, which tend to contaminate memories.
- Witnesses should be reassured that “I don’t’
remember” is an acceptable answer.
- See Hunt& Ellis textbook
• EOL: inferential (performance predicition)
– Prediction of one’s memory span
– 5 y-old are overconfident
– May be due to unfamiliarity (much better in
how far they can jump)
– At grade 4, improve calibration due in second
prediction (but not at grade 3)
– May be kids confound predicition with wishful
thinking (what I want to get), as they do better
in predicting others’ performance
Cues to FoK
Metamemory affects memory
• Imagine a multiple choice test in which
There are 5 choices
1 point if answer is correct
0 point if there is no answer (omission)
-.25 point if error
– Instructions “wild guesses will be penalized”,
• Problems:
– Instructions are vague:
• what counts as ‘wild’
• Cultural, gender, & personality biases in risk taking (control)
– Even with precise instructions there may be individual
differences in monitoring (e.g., overconfident)
Are most confident jurors also the most reliable ones?
• view videotape of actual murder trial
• make a global JoL (“I will remember __% of events in the trial”)
• Answer questions about the trial & provide a confidence
judgment for each answer
• JoL did not correlate with accuracy of response
• More confident jurors speak up (even though their
accuracy is no better than others’ accuracy)
– This may explain why deliberation does not enhance accuracy
– May also occur in study groups, in which the overconfident premed guy runs the show despite not having done the readings 
Pritchard & Keenan, 1999, JEP:Applied
Confidence judgments
• An example of a confidence judgment is when you mark
a question in the MC exam so you can go back to it at a
later time
• The typical procedure is to ask a general knowledge
question in a multiple choice format
– What is the capital of North Carolina? A. Charlotte, B. Raleigh
– Provide a confidence judgment: 50% = guessing 100% = certain
– Confidence is positively correlated with accuracy (resolution), but
Calibration of judgments
• Calibration = mean judgment – mean performance
• For confidence judgments
– There is over-confidence in hard items and
– There is under- confidence in easy ones
– People are poorly calibrated, suggesting they are insensitive to
how much they know, but why?
– One possibility is confirmation bias: People might consider
reasons why an answer may be correct and fail to consider why
the answer might be wrong
– When people are asked to provide reasons why their answer
might be incorrect, their judgment becomes much better
calibrated (koriat ’80)
JoL & FoK
• Are JoL & FoK based on the same info? Is r > 0?
• 20 paired associates, cued recall, learnt to criterion
– Half the pairs needed to be recalled correctly once
– The other half needed to be recalled 4 times (overlearnt)
– After item reached criterion -> immediate JoL (affected by
– 4 week retention interval
– Recall test (better for overlearnt items) (relative acc of JoL = .3)
– FoK to each non-recalled item (unaffected by overlearning;
relative acc of FoK = .2)
– Recognition test of non-recalled items
– Correlation bt JoL & FoK = .17 (very low)
Leonesio & Nelson, 1990
How do we come up with a metamemory
judgment? The Direct-Access Hypothesis
– Things that we store in our mind have a memory trace
– Some things have stronger trace than others
– Our JoL, FoK, & confidence judgments are based on
those traces.
– In the case of the FoK we cannot access a memory
but even in those cases we can experience the trace.
Evidence against the Direct-Access
• general knowledge questions, followed by FoK,
and recognition of unrecalled items
• Two types of questions
– Standard items:
• relative accuracy of FoK was .35
• In other words, FoK predicted recognition
– Deceiving items (e.g. ‘name the capital of California’
• relative accuracy of FoK was 0
• In other words, people were bad at predicting recognition
Koriat, 1995
More evidence against the Direct-Access
• Trivia questions (semantic memory)
– Answers were grouped based on how long it took subjects to
respond (a proxy for difficulty)
• Easy items required little processing of the event (episodic memory)
• Hard items required more elaboration (deeper processing)
• Immediate JoL re: free recall in 20 mins
• Test: Free recall of answers (episodic recall)
– Because Hard items required deeper processing, they are better
– If subjects relied on direct-access, harder items should receive
higher JoL (but they don’t!)
– This suggests subjects relied in an effort heuristic (hard -> low)
Benjamin, Bjork, & Schwartz, 1998
More evidence against the Direct-Access
• List of Words
– High frequency words
– Low frequency words Which type will be better recognized?
• Immediate JoL re: recognition
• Recognition Test
– Low frequency words are easier to recognized
– If subjects relied on direct-access, low frequency
words should receive higher JoL (but they don’t!)
– This suggests subjects relied on (incorrect) beliefs
about the effect of word frequency
Benjamin, , 1998
Cue-Familiarity Hypothesis of metacognitive
• FoK are based on familiarity with the cue (question),
rather than memory traces of the target (answer)
• If FoK is high, then people attempt target retrieval
• Since cue familiarity is likely to be related to target
familiarity, it is somewhat predictive (thus the finding of
FoK relative accuracy)
• Priming subjects with the cues leads to increased FoK
(without increased in recognition)
Reder, 1987
Accessibility Hypothesis of metacognitive
• FoK are based on accessibility of the target
• It is different from direct access hypothesis in that
– Information IS accessed
– The accessed information need not to be accurate (tricky
– the fluency with which the information is accessed is important
(e.g., trivia study, Benjamin et al.)
Koriat, 1993
Cue-Familiarity vs. Target Accessibility
• Proactive interference task with three conditions
– AB AB: familiar cue; fluent access to target
– AD AB: familiar cue; difficult access to target
– CD AB: novel cue; modest access to target
• Test
– Paired associate recall
– FoK for incorrectly recalled items
– Recognition test
• Results favored cue-familiarity hypotheses
• However, both hypotheses may be true as they are not
mutually incompatible. Cue-familiarity acts first, if high
then people attempt retrieval, point at which accessibility
influences FoK
Metcalfe, Schwartz, & Joaquim 1993
How well do students judge their learning?
• Study a paragraph
• Make a global JoL
• Recall
• Low correlation (.27)
Inputs for metamemory judgments
• Immediate JoL
– Ease of processing during study (begg et al, 1989;
Hertzog et al, 2003 Jep:lmc29, 22-34)
– How related cue & target are (koriat, 1997)
• FoK
– Cue familiarity (reder)
– Accessibility of partial information
Inputs for JoL
• Aspects of study processes
– Imagery vs. repetition
– Ease of processing
• Item characteristics
– Pair relatedness
– Concrete or abstract words
– Item frequency/familiarity
• Context of study
– Number of study trials
– Serial position of items
– Luminance of items
Inputs for Confidence Judgment
• For recalled items
– Latency of recall
– Item characteristics
– Pair relatedness
• For non-recalled items (also used in FoK)
– Cue familiarity (reder)
– Accessibility to partial information (koriat)
• For recognition task
– Latency of recognition
– Cue familiarity
– Recollecting an episode in which the item has been
– Reasons given why answer may be wrong
– Resons given why answer may be right
• In sum, metamemory judgments are modestly
related to performance (e.g, FoK relative
accuracy . 35).
• However, this is not due to some misterious
direct-access to trace
• Rather, it is due to use of inputs (cues) which
have varying degrees of predictiveness about
the target, from very low (e.g. deceptive items,
false beliefs on importance of word frequency) to
more positive (true relation between target and
• FoK is greatly impaired in Korsakoff
JoL & study time
• Intuitively, it makes sense that people will spend more
time studying those items that they believed have not yet
mastered (i.e., low JoL items) (see Son& Metcalfe 2000
for an exhaustive review)
• Discrepancy-Reduction Hypothesis:
– There is a goal state and a current state, and people try to
minimize the distance between current and goal states.
– But people also use context. So if you are short for time, you will
devote that time to easier items than to harder ones (Thiede &
Dunlosky, 1999).
– People also sometimes focus on items of intermediate difficulty
(Metcalfe, 2002)
• I think that in our study with Aaron what we
need to do is to find an input that we can
predict will be used for the JoL (e.g., ease
of reading, see Hertzog 2003) or
alternatively, look for a measure (delayed
JoL) that we think will be affected by
proactive interference.
• As it is now, why would PI affect
immediate JoL?? Sounds silly