Reading in Hindi and Telugu
Observations from the EVS technique
Troy Bailey,
Univ. of Hyderabad, CALTS
I. The Eye-Voice Span (EVS)
- When we read aloud, or eyes move several words ahead of our voice.
- The distance between the eyes and the voice in oral reading is the Eye-Voice Span.
I. The Eye-Voice Span - Function
- "Look-ahead" behavior is common to many human activities, from playing tennis to reading music.
"By successfully anticipating future events and skillfully coordinating overlapping movements, the expert performer is able to circumvent potential limits on basic elements of serial reactions" (Ericsson and Charness, 1994).
- By scanning ahead in the text, the reader creates a kind of buffer which allows him or her time to plan articulation, or to resolve unfamiliar words or structures (Blumenthal, 1970).
.
I. The Eye-Voice Span - History
Various uses:
- 1897 - Quantz used EVS to understand eye movements and reading processes
- Early 20th century - teachers used for measuring reading ability
- 1960s' - psycholinguistics confronts UG claims
I. The Eye-Voice Span - Technique
- The researcher pre-selects a "target word"
- The subject reads the passage aloud
- As the subject reads the word, the text is removed
- Count the number of words spoken after the target word. This is the EVS.
I. The Eye-Voice Span - Technique
*from Levin & Adis (1979)
I. The Eye-Voice Span - Targets
- Line-Final (Lf) targets give the best average spans for the passage
- Sentence-Initial (Si) targets are more sensitive to individual textual constraints.
I. The Eye-Voice Span - Text Variables
Influences on the EVS*
- Syntactic Structure
- Syntactic Boundaries
- Semantic Structure
- Text Difficulty
- Text Genre
*[Clark, 1972; Lawson, 1961; Levin & Adis, 1979; Levin & Kaplan, 1968; Morton, 1964]
I. The Eye-Voice Span - Evaluation
The EVS is useful for investigating various kinds of textual constraints
II. Word Breaks and Reading - History
- Earliest writing systems (e.g., Sanskrit, Hebrew, Aramaic) contain no word breaks
- Early writing used mostly for oral presentation
- Silent reading appears only after the introduction of spaces between words (Paul Sanger, 1997).
II. Word Breaks and Reading - Strategies
Strategies for marking word-boundaries:
- High Freq. Grammatical Particles:
- Chinese : (e.g., "zi", "de", "le")
- Alternate Symbol Sets:
- Arabic : special word-final graphemes.
- Japanese : case marked with "kana" symbols / content words marked with "kanji" (Chinese characters)
II. Word Breaks and Reading - Psycholingx
- The space between words influences the landing position of the next saccadic "jump" (Morris RK, Rayner K, Pollatsek A., 1990).
II. Word Breaks and Reading - Psycholingx
Saccadic Movements*
*From Larson, 2004
II. Word Breaks and Reading - Psycholingx
Visual Field
II. Word Breaks and Reading - Case Markers
North Indian languages encode case information with post-positions, whereas South Indian languages tend to use bound morphs. For example:
Percentage in TDIL corpora of Top 5 Words*
*Bharati, Akshar, Prakash Rao K, Rajeev Sangal, S M Bendre Basic Statistical Analaysis of Corpus and Cross Comparision among Corpora, Published in the proceedings of ICON-2002, Mumbai, 18-21 Dec 2002
I. Word Breaks and Reading - Case Markers
North Indian languages encode case information with post-positions, whereas South Indian languages tend to use bound morphs. For example:
Word Length*
*Bharati, Akshar, Prakash Rao K, Rajeev Sangal, S M Bendre Basic Statistical Analaysis of Corpus and Cross Comparision among Corpora, Published in the proceedings of ICON-2002, Mumbai, 18-21 Dec 2002
I. Word Breaks and Reading - Query
As such, should we expect Hindi and Telugu to be read in the same way?
III. Experiment: Hindi EVS
- 17 MT readers of Hindi (Masters & PhD students)
- Three Primary School Texts
Findings
- Hindi readers exhibit spans from 2-10 words
- Readers tend to read in grammatical phrases.
III. Experiment: Hindi EVS
III. Experiment: Telugu EVS
- 19 MT readers of Telugu (Masters & PhD students)
- Two Primary School Texts
Findings
- Telugu readers exhibit spans from 1-4 words
- Readers tend to read in grammatical phrases.
IV. Conclusions
- Eye-Voice Spans are greater for Hindi readers simply because the script treats case markers as individual words.
- For cross-language comparisons, the salient unit seems to be the "grammatical phrase".
IV. Conclusions
Areas for further exploration:
- Degree of interest / reader engagement with the text
- Consequences of various orthographic choices among neo-literary languages
Thanks to:
- Faculty of CALTS ( Prof. B. R. Bapuji, Prof. P. Dasgupta, Prof. P. Mohanty, Prof. G. Sengupta)
- Faculty and students of CIEFL, HCU & Lucknow University