Factors Affecting the Assessment of ESL Students’ Writing

IJAES Volume 5, Issue 1, Article 1

Jinyan Huang

Due to the different linguistic and cultural backgrounds of ESL students, the assessment of their English writing becomes a problematic area. On the one hand, many factors affect ESL students’ writing, including their English proficiency, mother tongue, home culture, and style of written communication. In rating ESL students’ writing, raters may differentially consider these factors. On the other hand, empirical studies have found that many factors (e.g., raters’ linguistic backgrounds, previous experiences, and prior training in assessment) affect the rating of ESL students’ writing. The impact of these factors leads to questions about the accuracy, precision and ultimately, the fairness of the assessment of ESL students’ writing. This paper reviews 20 major empirical studies in which the factors affecting the rating of ESL writing in North American school contexts were investigated. These factors are categorized into two types: rater-related and task-related. Rater-related factors include the rating methods used, rating criteria, raters’ academic disciplines, professional experiences, linguistic backgrounds, tolerance for error, perceptions and expectations, and rater training. Task-related factors include the types of and difficulty levels of writing tasks. The paper also identifies research gaps and proposes directions for future research in the assessment of ESL writing.

American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME). (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.

Bachman, L. (2000). Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing, 17(1), 1-42.

Bachman, L. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.

Blackett, K. (2002). Ontario schools losing English as a second language programs – despite increase in immigration. Retrieved August 31, 2004, from http://www.peopleforeducation.com/releases/2003/oct24_02.html

Brennan, R. L. (2001). Statistics for social science and public policy: Generalizability theory. New York: Springer-Verlag.

Brown, J. D. (1991). Do English and ESL faculties rate writing samples differently? TESOL Quarterly, 25(4), 587-603.

Brown, J. D., Hilgers, T., & Marsella, J. (1991). Essay prompts and topics. Written Communication, 8, 533-556.

Casanave, C. P., & Hubbard, P. (1992). The writing assignments and writing problems of doctoral students: Faculty perceptions, pedagogical issues, and needed research. English for Specific Purposes, 11, 33-49.

Canadian Bureau for International Education. (2002, April 15). International student numbers hit record high, but Canada offers dwindling support for African students. Retrieved October 28, 2002, from http://www.cbie.ca/news/index_e.cfm?folder=releases&page=rel_2002-04-15_e

Cole, N. S., & Moss, P. A. (1989). Bias in test use. In R. L. Linn (Ed.), Educational measurement (pp. 201-219). New York: Macmillan.

Cole, N. S., & Zieky, M. J. (2001). The new faces of fairness. Journal of Educational Measurement, 38, 369-382.

Connor-Linton, J. (1995). Looking behind the curtain: What do L2 composition ratings really mean? TESOL Quarterly, 29(4), 762-765.

Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart and Winston.

Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley.

Cumming, A. (1990a). Application of contrastive rhetoric in advanced ESL writing. Paper presented at the 24th Annual TESOL Conference, San Francisco, CA.

Cumming, A. (1990b). Expertise in evaluating second language composition. Language Testing, 7, 31-51.

Davidson, F. (1991). Statistical support for training in ESL composition rating. In L. Hamp-Lyons (Ed.), Assessing second language writing (pp. 155-165). Norwood, NJ: Ablex.

Education Quality and Accountability Office. (2002). Ontario Secondary School Literacy Test, February 2002: Report of provincial results. Toronto: Queen’s Printer for Ontario.

Engber, C. A. (1995). The relationship of lexical proficiency to the quality of ESL compositions. Journal of Second Language Writing, 4, 139-155.

Ericsson, K. A., & Simon, H. (1993). Protocol analysis: Verbal reports as data. Cambridge, MA: MIT Press.

Ferris, D. (1994). Rhetorical strategies in student persuasive writing: Differences between native and nonnative English speakers. Research in the Teaching of English, 28, 45-65.

Gamaroff, R. (2000). Rater reliability in language assessment: The bug of all bears. System, 28, 31-53.

Hamp-Lyons, L. (1990). Second language writing: Assessment issues. In B. Kroll (Ed.), Second language writing: Research insights for the classroom (pp. 69-87). Cambridge: Cambridge University Press.

Hamp-Lyons, L. (1991a). Issues and directions in assessing second language writing in academic contexts. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 323-329). Norwood, NJ: Ablex.

Hamp-Lyons, L. (1991b). Rating procedures for ESL contexts. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 241-277). Norwood, NJ: Ablex.

Hamp-Lyons, L. (1996). The challenges of second language writing assessment. In E. White, W. Lutz, & S. Kamusikiri (Eds.), Assessment of writing: Policies, politics, practice (pp. 226-240). New York: Modern Language Association.

Hamp-Lyons, L., & Mathias, S. P. (1994). Examining expert judgments of task difficulty on essay tests. Journal of Second Language Writing, 3(1), 49-68.

Hayward, M. (1990). Evaluations of essay prompts by nonnative speakers of English. TESOL Quarterly, 24, 753-758.

Hinds, J. (1987). Reader versus writer accountability: A new typology. In U. Connor & R. Kaplan (Eds.), Writing across languages: Analysis of L2 text (pp. 141-152). Reading, MA: Addison-Wesley.

Hinkel, E. (2003). Simplicity without elegance: Features of sentences in L1 and L2 academic texts. TESOL Quarterly, 37, 275-301.

Homburg, T. (1984). Holistic evaluation of ESL compositions: Can it be validated objectively? TESOL Quarterly, 18(1), 87-107.

Huot, B. A. (1990). Reliability, validity, and holistic rating: What we know and what we need to know. College Composition and Communication, 41, 201-213.

Institute of International Education. (2001, May 16). 98/99 open doors on the Web. Retrieved June 15, 2002, from http://www.opendoorsweb.org/Lead%20Stories/international_studs.htm

Intaraprawat, P., & Steffensen, M. S. (1995). The use of metadiscourse in good and poor ESL essays. Journal of Second Language Writing, 4(3), 253-272.

Jacobs, H. L., Zingraf, S. A., Wormuth, D. R., Hartfiel, V. F., & Hughey, J. B. (1981). Testing ESL composition: A practical approach. Rowley, MA: Newbury House.

Janopoulos, M. (1992). University faculty tolerance of NS and NNS writing errors: A comparison. Journal of Second Language Writing, 1(2), 109-121.

Janopoulos, M. (1995). Writing across the curriculum, writing proficiency exams, and the NNS college student. Journal of Second Language Writing, 4, 43-50.

Johns, A. M. (1991). Interpreting an English competency examination: The frustrations of an ESL science student. Written Communication, 8(3), 379-401.

Johnson, R. L., Penny, J., & Gordon, B. (2000). The relation between score resolution methods and interrater reliability: An empirical study of an analytic rating rubric. Applied Measurement in Education, 13(2), 121-138.

Joint Advisory Committee. (1993). Principles for fair student assessment practices for education in Canada. Edmonton, AB.

Kobayashi, T. (1992). Native and nonnative reactions to ESL compositions. TESOL Quarterly, 26, 81-112.

Kunnan, A. J. (1997). Connecting fairness and validation. In A. Huhta, V. Kohonen, L. Kurki-Suomo, & S. Luoma (Eds.), Current developments and alternatives in language assessment (pp. 85-105). Jyvaskyla, Finland.

Kunnan, A. J. (Ed.). (2000). Fairness and validation in language assessment. Cambridge: Cambridge University Press.

Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago, IL: MESA Press.

Linn, R. L. (1989). Educational measurement (3rd ed.). New York: Macmillan.

Linn, R. L., & Burton, E. (1994). Performance-based assessment: Implications of task specificity. Educational Measurement: Issues and Practice, 13, 5-8.

McDaniel, B. A. (1985). Ratings vs. equity in the evaluation of writing. Paper presented at the annual meeting of the Conference on College Composition and Communication, Minneapolis, MN. (ERIC Document Reproduction Service No. ED 260 459)

Mendelsohn, D., & Cumming, A. (1987). Professors’ ratings of language use and rhetorical organization in ESL compositions. TESL Canada Journal, 5, 9-26.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13-103). New York: Macmillan.

Ostler, S. (1987). English in parallels: A comparison of English and Arabic prose. In U. Connor & R. Kaplan (Eds.), Writing across languages: Analysis of L2 text (pp. 169-185). Reading, MA: Addison-Wesley.

Perkins, K. (1983). On the use of composition rating techniques, objective measures, and objective tests to evaluate ESL writing ability. TESOL Quarterly, 17(4), 651-671.

Reid, J., & O’Brien, M. (1981). The application of holistic grading in an ESL writing program. Paper presented at the annual convention of Teachers of English to Speakers of Other Languages, Detroit, MI. (ERIC Document Reproduction Service No. ED 221 044)

Rubin, D. L., & Williams-James, M. (1997). The impact of writer nationality on mainstream teachers’ judgments of composition quality. Journal of Second Language Writing, 6(2), 139-153.

Ruetten, M. K. (1994). Evaluating ESL students’ performance on proficiency exams. Journal of Second Language Writing, 3, 85-96.

Russikoff, K. A. (1995). A comparison of writing criteria: Any differences? Paper presented at the annual meeting of the Teachers of English to Speakers of Other Languages, Long Beach, CA.

Sakyi, A. (2000). Validation of holistic rating for ESL writing assessment: How raters evaluate ESL compositions. In A. Kunnan (Ed.), Fairness and validation in language assessment (pp. 129-152). Cambridge: Cambridge University Press.

Santos, T. (1988). Professors’ reactions to the writing of nonnative-speaking students. TESOL Quarterly, 22(1), 69-90.

Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.

Shaw, P., & Liu, E. T.-K. (1998). What develops in the development of second language writing. Applied Linguistics, 19, 225-254.

Song, B., & Caruso, I. (1996). Do English and ESL faculty differ in evaluating the essays of native English-speaking and ESL students? Journal of Second Language Writing, 5(2), 163-182.

Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72-101.

Speck, B. W., & Jones, T. R. (1998). Direction in the grading of writing? In F. Zak & C. C. Weaver (Eds.), The theory and practice of grading: Problems and possibilities (pp. 17-29). Albany: SUNY Press.

Stiggins, R. J., & Bridgeford, N. J. (1983). An analysis of published tests of writing proficiency. Educational Measurement: Issues and Practices, 2(1), 6-19.

Sweedler-Brown, C. O. (1993). ESL essay evaluation: The influence of sentence-level and rhetorical features. Journal of Second Language Writing, 2, 3-17.

Thompson, R. (1990). Writing proficiency tests and remediation: Some cultural differences. TESOL Quarterly, 24, 99-102.

Vann, R., Lorenz, F., & Meyer, D. (1991). Error gravity: Faculty response to errors in the written discourse of nonnative speakers of English. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 181-195). Norwood, NJ: Ablex.

Vann, R., Meyer, D., & Lorenz, F. (1984). Error gravity: A study of faculty opinion of ESL errors. TESOL Quarterly, 18, 427-440.

Vaughan, C. (1991). Holistic assessment: What goes on in the raters’ minds? In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 111-126). Norwood, NJ: Ablex.

Weigle, S. C. (1994). Effects of training on raters of ESL compositions. Language Testing, 11, 197-223.

Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263-287.

Weigle, S. C. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing, 6(2), 145-178.

Weigle, S. C., Boldt, H., & Valsecchi, M. I. (2003). Effects of task and rater background on the evaluation of ESL writing: A pilot study. TESOL Quarterly, 37(2), 345-354.

Wiggins, G. (1993). Assessing student performance: Exploring the purpose and limits of testing. San Francisco, CA: Jossey-Bass.

Yang, Y. (2001). Chinese interference in English writing: Cultural and linguistic differences. (ERIC Document Reproduction Service No. ED 461 992).