>!-- style sheet links --<

IRFA:   A Topographical Approach to Text Assessment

Troy Bailey

University of Hyderabad / SIL Inc.

Text Assessment...

Text Assessment...

Without Readability Formulas



"Text Assessment" is ..



leveling texts to readers



so that they are neither,

too difficult...


nor too easy...


Approaches to Text Assessment


Comparing Approaches...


Global Techniques


ProsCons
Type of Output Reading level from a single score No indication as to causes of unreadability
Portability None - Language-Specific Application to new languages requires extensive calibration with subjects of various reading levels

Comparing Approaches...


Local Techniques


ProsCons
Type of Output Rich info as to location and severity of difficulty "Leveling" texts to readers means setting flexible threshholds
Portability High - evaluators must only be familiar with script None

IRFA...Origins




Inter-Reader Fluency Analysis

IRFA...Origins


IRFA...Origins


IRFA...Origins


IRFA...Origins


IRFA...


Inter-Reader Fluency Analysis


Uses oral reading data to:


The focus is thus, on texts rather than readers.

IRFA... Manually

Imagine that we listen to a child reading the following text in Bengali. A person need not be fully proficient in the language in order to identify the disfluent utterances of learners. Unnatural pauses, hesitations, and even multi-word backtrackings emerge quite distincly for the careful listner. Assuming that the reviewer is familiar with the script, she or he can simply mark these disfluent locations on a photocopy of the text.

IRFA... Manually

As we proceed from subject to subject, we note the various locations where each makes some kind of pause, hesitation, or backtrack. Unlike in Miscue Analysis, in IRFA, we ignore accuracy errors, because our goal is not to probe the cognitive processes of the reader, but rather to estimate the cognitive load of the various structures in the text. Put theoretically, we are concerned first of all with fluency as temporal phenomena. [much of the ideas for this derive from Jeffrey Walczyk's "Compensatory-Encoding Model", in the Journal of the International Reading Assoc., 20001

IRFA... Manually

The language here is Bengali, spoken both in the state of West Bengal, in Northern India, and as the natinonal language of the country of Bangladesh.

IRFA... Manually

IRFA... Manually

IRFA... Manually

IRFA... Manually

IRFA... Manually

IRFA...


Challenge:

When readers are still learning to read,   how do we


Response:

Returning to our question...

IRFA...


IRFA...


Total Errors for Each Word in the Passage


By plotting shared disfluencies across the words of a passage we see that not all "errors" are equal. Here the horizontal access represents the words of the passage in the order they appear in the text. The height of the peak represents the number of students who made some kind of fluency error at that point. The graph was generated from TopoText, a program developed in the course of the research.

(TopoText Software)...

With topotext a research, or teacher, can tag the various types of disfluency she observes for each of the students...

(TopoText Software)...

With one file for each student

(TopoText Software)...

(TopoText Software)...



The software then tallies these and produces the topograph. This is one portion of the analysis generated for the Bengali data you saw from the hard copy code sheets.

IRFA...


Here we see the topograph of a passage read by 32 High School students. Note the low level "noise" in the data. We can see many peaks which reach only about 20-30% of the total height of the graph.

IRFA...

"Raising the Floor" by 30%

Here we see the same topograph, only with each of the data points subtracted by 30%. By "raising the floor" we can remove from view much of the idiosyncratic errors and find much sharper peaks. While a quick glance back at the data reveals a kind of natural "floor" at around 30%, what statistical justification do we have for treating the data like this?

IRFA...


Data derived from PC-Size, Copyright G.E. Dallal. 1990.
Looking at a standard probability chart, we see that for a sample of 32 subjects effects less than about 30% are insignificant. Hence, the justification for setting the floor at 10 for this data.

Texts as Topographical



Hypothesis:


Texts have their own inherent "topography" with respect to processing load, such that both skilled and less skilled readers must negotiate this same inherent topography. Hence, the relative processing costs of the various vocabulary and syntactic structures, are essentially the same for all readers of the text.
We can think of reading, in a sense, like mountain climbing. Some climbers are more experienced and can climb the mountain faster. Others, however take more time. All, however, must negotiate the same terrain in their ascent. Reading is the same with respect to skill.

Texts as Topographical


Evidence from Eye Fixations

The Eye-Voice Span

An important notion in the understanding of reading fluency is that of Eye-Voice Span. Essentially, the idea is that in oral reading, the eye normally moves two or three words, or more, in front of the voice. The length of this span is somewhat elastic, and has been shown to vary not only with reader skill, but with the particular linguistic aspects of the text. Figure 1 was taken from Levin and Adis, 1979, and plots the voice (V) and eye (E) positions of a "poor reader" at several observation points in the passage. The vertical solid bars representing eye-fixation points as measured by his equipment -- the number above indicating the sequence of the fixation, and the number below representing the duration of time (in 50ths of a second) that the eye paused at that location.

The Eye-Voice Span


EVS is sensitive to the following*:
  • Active/Passive Voice
  • Marked Direction of Embedding
  • Topic/Comment Roles


*Levin & Adis, 1979

As mentioned above, the EVS is sensitive to text difficulty or text "constraints". Research in the 1960s revealed quite a few linguistic factors which make a significant contribution to the size of EVS spans. * Passive sentences, particularly the agentive phrase, exhibit longer EVSs than their active counterparts. * Left embedded clauses force longer spans in English.

Texts as Topographical



Hypothesis:


Both skilled and less skilled readers must negotiate the same inherent topography of a text. Hence, the relative processing costs of the various vocabulary and syntactic structures in a text, are essentially the same for all readers of a text.
These kind of findings lead me to the following hypothesis

Texts as Topographical


Points of difficulty quite similar between groups

Pearson r = .76 (p<.05)

As the figure shows, the word-by-word fluency for fast and slow readers is quite similar, differing mainly by degree. When the Pearson correlation coefficient was performed on the data, the strength of the relationship was both substantial and statistically significant (r = .76, p < .05).

Texts as Topographical


Conclusion:     After reviewing research on

There is ample research to support the notion that texts have their own inherent topography, apart from individual differences in skill, background or reading preferences.

Disfluency as Compensation

(C-EM, Walczyk, 2000)

Behaviors
Strategies
The essential difference being the amount of time required.
We can think of the following as kinds of disfluency as compensatory strategies and behaviors. (I am distinguishing the two according to Walczyk's C-EM, which highlights

Disfluency as Compensation


Readers respond similarly to difficult items, despite reading skill level

Note that the diagonal shows the strongest correlations. In other words, at given positions in the text, text structures seem to require a certain kind of strategy, apart from skill differences. When examining fluency performance between different skill groups, some interesting patterns emerge. There is evidence to suggest that certain kinds of text difficulty demand certain kinds of compensatory behaviors. In other words, that text topography, and not skill, determines the choice of reading strategy at a given place in the passage.

Finding the Locus of Difficulty (d)




Finding the Locus of Difficulty (d)




Finding the Locus of Difficulty (d)

What might "Reading Strategies" tell us?


Behaviors
Strategies

Finding the Locus of Difficulty (d)


The "Repair Level Hierarchy" (Bailey, 2005)

Some predictions...

Repair Strategies





In order to test these ideas, the data were 1) tagged according to the 5 disfluency types mentioned above 2) analyzed in a word by word fashion, totalling each type across the sample of 32 readers 3) Correlations between the prevalence of each disfluency type were made 4) Furthermore, correlations were gathered for the lexical context (two words in front of, and two words behind of the current word position where the disfluency occured). The current diagram shows the results of these data. The y-axis shows the pearson correlation. The x-axis shows the word position, -2, -1, 0, 1, 2 The lines represent various indicies of text difficulty: word length, graphic word length, and glyph density [a metric developed for use with Indic scripts which often hang vowel signs in 360 degrees around the character. Glyph density thus estimates the amount of information per horizontal pixel within each word.]

Repair Strategies





Repair Strategies





Repair Strategies





Repair Strategies





Repair Strategies





Finding (d) from Repair Strategies

Finding (d) from Repair Strategies




Recall the earlier question...

Finding (d) from Repair Strategies

We can see that: 1) Decelerations (blue) precede peaks 2) Pauses (red) as well 3) Hesitations (yellow) correspond to peak locations

Finding (d) from Repair Strategies

Finding (d) from Repair Strategies

Finding (d) from Repair Strategies

Implications for IRFA



Attention to Strategies is useful in that,

  • Locating the true point of difficulty becomes more accurate
  • This results in sharper peaks in the graph.

EVALUATION





So ...?

Inter-Rater Reliability

Expert Intuitions


A panel of ten professional, MT speakers of Telugu was selected, including:
The inter-rater reliability (Kappa) was k=0.27 (quite low).

Inter-Rater Reliability

IRFA Trainees


Three linguistics grad. students were given brief (30 sec.) instruction in IRFA tagging:

Conclusion:

IRFA assessments, even with minimal training of the raters, are more accurate than expert intuitions for identifying difficult vocabulary (k=.51 vs. k=.27).

Tentative Conclusions


Thanks to: