IRFA: A Topographical Approach to Text Assessment
Troy Bailey
University of Hyderabad / SIL Inc.
Text Assessment...
Text Assessment...
Without Readability Formulas
"Text Assessment" is ..
leveling texts to readers
so that they are neither,
too difficult...
nor too easy...
Approaches to Text Assessment
- Global
- Readability Formulae (e.g., Fry, Flesch-Kincaid etc.)
- The "Cloze Procedure" (Taylor, 1953)
- Local
- Eye-Voice Span Measures (Buswell, 1920)
- Inter-Reader Fluency Analysis (Bailey, 2005)
Comparing Approaches...
Global Techniques
| Pros | Cons |
Type of Output |
Reading level from a single score |
No indication as to causes of unreadability |
Portability |
None - Language-Specific |
Application to new languages requires extensive calibration with subjects of various reading levels |
Comparing Approaches...
Local Techniques
| Pros | Cons |
Type of Output |
Rich info as to location and severity of difficulty |
"Leveling" texts to readers means setting flexible threshholds |
Portability |
High - evaluators must only be familiar with script |
None |
IRFA...Origins
Inter-Reader Fluency Analysis
IRFA...Origins
IRFA...Origins
IRFA...Origins
IRFA...Origins
IRFA...
Inter-Reader Fluency Analysis
Uses oral reading data to:
- identify difficult words or structures, and
- estimate the relative difficulty of these
The focus is thus, on texts rather than readers.
IRFA... Manually
Imagine that we listen to a child reading the following text in Bengali.
A person need not be fully proficient in the language in order to identify the disfluent utterances
of learners. Unnatural pauses, hesitations, and even multi-word backtrackings emerge quite
distincly for the careful listner.
Assuming that the reviewer is familiar with the script, she or he can simply mark these disfluent
locations on a photocopy of the text.
IRFA... Manually
As we proceed from subject to subject, we note the various locations where
each makes some kind of pause, hesitation, or backtrack. Unlike in Miscue Analysis, in IRFA,
we ignore accuracy errors, because our goal is not to probe the cognitive processes of the reader,
but rather to estimate the cognitive load of the various structures in the text.
Put theoretically, we are concerned first of all with fluency as temporal phenomena. [much of the
ideas for this derive from Jeffrey Walczyk's "Compensatory-Encoding Model", in the Journal of the
International Reading Assoc., 20001
IRFA... Manually
The language here is Bengali, spoken both in the state of West Bengal, in Northern India,
and as the natinonal language of the country of Bangladesh.
IRFA... Manually
IRFA... Manually
IRFA... Manually
IRFA... Manually
IRFA... Manually
IRFA...
Challenge:
When readers are still learning to read, how do we
- distinguish idiosyncratic / developmental variability
- from text driven variability?
Response:
- look for shared errors across a sample of readers.
- If clear patterns don't appear, sample more readers.
Returning to our question...
IRFA...
IRFA...
Total Errors for Each Word in the Passage
By plotting shared disfluencies across the words of a passage
we see that not all "errors" are equal. Here the horizontal access represents the words
of the passage in the order they appear in the text. The height of the peak represents the
number of students who made some kind of fluency error at that point.
The graph was generated from TopoText, a program developed in the course of the research.
(TopoText Software)...
With topotext a research, or teacher, can tag the various types of
disfluency she observes for each of the students...
(TopoText Software)...
With one file for each student
(TopoText Software)...
(TopoText Software)...
The software then tallies these and produces the topograph.
This is one portion of the analysis generated for the Bengali data you saw
from the hard copy code sheets.
IRFA...
Here we see the topograph of a passage read by 32 High School students.
Note the low level "noise" in the data. We can see many peaks which reach only about 20-30%
of the total height of the graph.
IRFA...
"Raising the Floor" by 30%
Here we see the same topograph, only with each of the data points subtracted by 30%.
By "raising the floor" we can remove from view much of the idiosyncratic errors and find
much sharper peaks.
While a quick glance back at the data reveals a kind of natural "floor" at around 30%,
what statistical justification do we have for treating the data like this?
IRFA...
Data derived from PC-Size, Copyright G.E. Dallal. 1990.
Looking at a standard probability chart, we see that for a sample of
32 subjects effects less than about 30% are insignificant. Hence, the justification for
setting the floor at 10 for this data.
Texts as Topographical
Hypothesis:
Texts have their own inherent "topography" with respect to processing load, such that
both skilled and less skilled readers must negotiate this same inherent topography. Hence,
the relative processing costs of the various vocabulary and syntactic structures,
are essentially the same for all readers of the text.
We can think of reading, in a sense, like mountain climbing. Some climbers are more experienced
and can climb the mountain faster. Others, however take more time. All, however, must negotiate
the same terrain in their ascent. Reading is the same with respect to skill.
Texts as Topographical
Evidence from Eye Fixations
- Content words are fixated about 85% of the time, while function words are fixated only about 35% of the time (Carpenter & Just, 1983).
- As word length increases, the probability of fixating (and refixating) a word increases (Rayner and McConkie, 1976).
- Short words (2 or 3 letters) are fixated around 25% of the time, whereas longer words (8 letters or longer) are almost always fixated and are often refixated (Starr and Rayner, 2001).
The Eye-Voice Span
An important notion in the understanding of reading fluency is that of
Eye-Voice Span. Essentially, the idea is that in oral reading, the eye normally moves two or three
words, or more, in front of the voice. The length of this span is somewhat elastic, and has
been shown to vary not only with reader skill, but with the particular linguistic aspects
of the text.
Figure 1 was taken from Levin and Adis, 1979, and plots the voice (V) and eye (E)
positions of a "poor reader" at several observation points in the passage. The vertical
solid bars representing eye-fixation points as measured by his equipment -- the number
above indicating the sequence of the fixation, and the number below representing the
duration of time (in 50ths of a second) that the eye paused at that location.
The Eye-Voice Span
EVS is sensitive to the following*:
- Active/Passive Voice
- Marked Direction of Embedding
- Topic/Comment Roles
*Levin & Adis, 1979
As mentioned above, the EVS is sensitive to text difficulty or text "constraints".
Research in the 1960s revealed quite a few linguistic factors which make a significant
contribution to the size of EVS spans.
* Passive sentences, particularly the agentive phrase, exhibit longer EVSs than their
active counterparts.
* Left embedded clauses force longer spans in English.
Texts as Topographical
Hypothesis:
Both skilled and less skilled readers must negotiate the same
inherent topography of a text. Hence, the
relative processing costs
of the various vocabulary and syntactic structures in a text, are
essentially the same for all readers of a text.
These kind of findings lead me to the following hypothesis
Texts as Topographical
Points of difficulty quite similar between groups
Pearson r = .76 (p<.05)
As the figure shows, the word-by-word fluency for fast and slow readers is
quite similar, differing mainly by degree. When the Pearson correlation coefficient
was performed on the data, the strength of the relationship was both substantial and
statistically significant (r = .76, p < .05).
Texts as Topographical
Conclusion: After reviewing research on
- Eye Fixations
- Eye Movements
- Eye-Voice Spans
There is ample research to support the notion that texts have their own inherent
topography, apart from individual differences in skill, background or reading preferences.
Disfluency as Compensation
(C-EM, Walczyk, 2000)
Behaviors
- SLO-Decelerations
- PAU-Pauses
- HES-Hesitations
Strategies
- CMP-Compound Errors
- REP-Repetitions
The essential difference being the amount of time required.
We can think of the following as kinds of disfluency as
compensatory strategies and behaviors. (I am distinguishing
the two according to Walczyk's C-EM, which highlights
Disfluency as Compensation
Readers respond similarly to difficult items, despite reading skill level
Note that the diagonal shows the strongest correlations. In other words, at given
positions in the text, text structures seem to require a certain kind of strategy,
apart from skill differences.
When examining fluency performance between different skill groups, some interesting
patterns emerge.
There is evidence to suggest that certain kinds of text difficulty
demand certain kinds of compensatory behaviors.
In other words, that text topography, and not skill, determines the choice of reading
strategy at a given place in the passage.
Finding the Locus of Difficulty (d)
Finding the Locus of Difficulty (d)
Finding the Locus of Difficulty (d)
What might "Reading Strategies" tell us?
Behaviors
- SLO-Decelerations
- PAU-Pauses
- HES-Hesitations
Strategies
- CMP-Compound Errors
- REP-Repetitions
Finding the Locus of Difficulty (d)
The "Repair Level Hierarchy" (Bailey, 2005)
Some predictions...
Repair Strategies
In order to test these ideas, the data were
1) tagged according to the 5 disfluency types mentioned above
2) analyzed in a word by word fashion, totalling each type across the sample of 32 readers
3) Correlations between the prevalence of each disfluency type were made
4) Furthermore, correlations were gathered for the lexical context (two words in front of,
and two words behind of the current word position where the disfluency occured).
The current diagram shows the results of these data.
The y-axis shows the pearson correlation.
The x-axis shows the word position, -2, -1, 0, 1, 2
The lines represent various indicies of text difficulty: word length, graphic word length,
and glyph density [a metric developed for use with Indic scripts which often hang vowel signs
in 360 degrees around the character. Glyph density thus estimates the amount of information
per horizontal pixel within each word.]
Repair Strategies
Repair Strategies
Repair Strategies
Repair Strategies
Repair Strategies
Finding (d) from Repair Strategies
Finding (d) from Repair Strategies
Recall the earlier question...
Finding (d) from Repair Strategies
We can see that:
1) Decelerations (blue) precede peaks
2) Pauses (red) as well
3) Hesitations (yellow) correspond to peak locations
Finding (d) from Repair Strategies
Finding (d) from Repair Strategies
Finding (d) from Repair Strategies
Implications for IRFA
Attention to Strategies is useful in that,
- Locating the true point of difficulty becomes more accurate
- This results in sharper peaks in the graph.
EVALUATION
So ...?
Inter-Rater Reliability
Expert Intuitions
A panel of ten professional, MT speakers of Telugu was selected, including:
- Three university professors (Telugu literature and linguistics),
- Four graduate level (MA and PhD) students of linguistics, and
- Three experienced school teachers
The inter-rater reliability (Kappa) was k=0.27 (quite low).
Inter-Rater Reliability
IRFA Trainees
Three linguistics grad. students were given brief (30 sec.) instruction in IRFA tagging:
- Readers: six high school students
- Passage: "Rudraiah" (the same as in the above study)
- Inter-rater reliability: k = 0.51 (moderately good).
Conclusion:
IRFA assessments, even with minimal training of the raters, are more accurate than expert intuitions for identifying difficult vocabulary (k=.51 vs. k=.27).
Tentative Conclusions
- Local (fluency) techniques, rather than Global (e.g., readability formulae) are better suited for classroom situations, and minority language development programs
- Fluency ratings on real data seem more reliable than expert intuitions about difficult vocabulary.
- Texts have their own inherent topography which skill may mask, but not bypass.
- The "Repair Strategy Hierarchy" illustrates the diminishing options available to readers of various skill levels when difficult structures are encountered.
Thanks to:
- Faculty of CALTS ( Prof. B. R. Bapuji, Prof. P. Mohanty, Prof. G. Umamaheshwara Rao)
- Students of CALTS for help with Telugu text checking
- Elementary school students in Hyderabad, and in various places in Bangladesh. Thank you for your help!