Scispace (Formerly Typeset)
  1. Home
  2. Journals
  3. Language Testing
  4. 2022
  1. Home
  2. Journals
  3. Language Testing
  4. 2022
Showing papers in "Language Testing in 2022"
Journal Article•10.1177/02655322211057040•
Critical language assessment literacy of EFL teachers: Scale construction and validation

[...]

Zia Tajeddin, Mohammad Khatib, Mohsen Mahdavi
07 Jan 2022-Language Testing
TL;DR: The authors developed and validated a critical language assessment literacy (CLAL) scale to further underscore the role of CLA principles and their practice as an essential part of teachers' LAL, which had a high level of internal consistency and construct validity, which suggests that this scale has the potential to be useful in assessing language teachers' CLAL and to raise language teachers’ awareness of CLA constructs.
Abstract: Critical language assessment (CLA) has been addressed in numerous studies. However, the majority of the studies have overlooked the need for a practical framework to measure the CLA dimension of teachers’ language assessment literacy (LAL). This gap prompted us to develop and validate a critical language assessment literacy (CLAL) scale to further underscore the role of CLA principles and their practice as an essential part of teachers’ LAL. In the first phase, a pool of items was generated through a comprehensive review of the related studies. In the quantitative phase, the developed scale was administered to 255 English as a foreign language teachers selected through convenience and snowball sampling. The data were analyzed through exploratory factor analysis for construct validity and Cronbach’s alpha for estimating internal consistency. The results showed that the items loaded on five factors: (a) teachers’ knowledge of assessment objectives, scopes, and types; (b) assessment use consequences; (c) fairness; (d) assessment policies; and (e) national policy and ideology. It was found that the scale had a high level of internal consistency and construct validity, which suggests that this scale has the potential to be useful in assessing language teachers’ CLAL and to raise language teachers’ awareness of CLAL constructs.

22 citations

Journal Article•10.1177/02655322221086211•
Test Review: The International English Language Testing System (IELTS)

[...]

John Read
04 Apr 2022-Language Testing

21 citations

Journal Article•10.1177/02655322211066822•
Assessing Rasch measurement estimation methods across R packages with yes/no vocabulary test data

[...]

Christopher Nicklin, Joseph P. Vitta
03 Feb 2022-Language Testing
TL;DR: ERm, a CMLE-based R package, was utilized to conduct a dichotomous Rasch analysis of a Yes/No vocabulary test based on the academic word list and the resulting parameters and diagnostic statistics were compared with the equivalent results from four other R-based Rasch measurement software packages and Winsteps.
Abstract: Instrument measurement conducted with Rasch analysis is a common process in language assessment research. A recent systematic review of 215 studies involving Rasch analysis in language testing and applied linguistics research reported that 23 different software packages had been utilized. However, none of the analyses were conducted with one of the numerous R-based Rasch analysis software packages, which generally employ one of the three estimation methods: conditional maximum likelihood estimation (CMLE), joint maximum likelihood estimation (JMLE), or marginal maximum likelihood estimation (MMLE). For this study, eRm, a CMLE-based R package, was utilized to conduct a dichotomous Rasch analysis of a Yes/No vocabulary test based on the academic word list. The resulting parameters and diagnostic statistics were compared with the equivalent results from four other R-based Rasch measurement software packages and Winsteps. Finally, all of the packages were utilized in the analysis of 1000 simulated datasets to investigate the extent to which results generated from the contrasting estimation methods converged or diverged. Overall, the differences between the results produced with the three estimation methods were negligible, and the discrepancies observed between datasets were attributable to the software choice as opposed to the estimation method.

16 citations

Journal Article•10.1177/02655322211064629•
Revisiting English language proficiency and its impact on the academic performance of domestic university students in Singapore

[...]

Wenjin Vikki Bo, Mingchen Fu, Wei-Ying Lim
28 Feb 2022-Language Testing
TL;DR: This paper explored the relationship among university students' previous academic experience, English language proficiency, and their current academic performance within a sample of 514 Singaporean students (252 females and 262 males) and found that students' proficiency scores significantly predicted their current grade point average (GPA) with their prior academic performance being controlled.
Abstract: The role of international students’ English language proficiency has been extensively researched to understand its impact on academic achievement in English-medium universities, mainly because of students’ non-English-speaking backgrounds. However, the relationship between language proficiency and academic achievement among English-speaking-background students remains under-researched, especially in multilingual societies, such as Singapore. The present study explored the relationship among university students’ previous academic experience, English language proficiency, and their current academic performance within a sample of 514 Singaporean students (252 females and 262 males). Findings showed that students’ proficiency scores significantly predicted their current grade point average (GPA) with their prior academic performance being controlled. Moreover, proficiency scores significantly strengthened the association between students’ prior academic performance and their current GPA. Finally, academic discipline showed a marginally significant moderating effect in the relationship between proficiency scores and current GPA. Implications and limitations of the study are discussed.

15 citations

Journal Article•10.1177/02655322221112364•
A meta-analysis on the predictive validity of English language proficiency assessments for college admissions

[...]

Samuel D. Ihlenfeldt, Joseph A. Rios
16 Aug 2022-Language Testing
TL;DR: In this paper , the authors meta-analytically synthesized 132 effect sizes from 32 studies containing validity evidence of academic English assessments to determine whether different assessments (a) predicted academic success (as measured by grade point average [GPA]) and (b) did so comparably.
Abstract: For institutions where English is the primary language of instruction, English assessments for admissions such as the Test of English as a Foreign Language (TOEFL) and International English Language Testing System (IELTS) give admissions decision-makers a sense of a student’s skills in academic English. Despite this explicit purpose, these exams have also been used for the practice of predicting academic success. In this study, we meta-analytically synthesized 132 effect sizes from 32 studies containing validity evidence of academic English assessments to determine whether different assessments (a) predicted academic success (as measured by grade point average [GPA]) and (b) did so comparably. Overall, assessments had a weak positive correlation with academic achievement (r = .231, p < .001). Additionally, no significant differences were found in the predictive power of the IELTS and TOEFL exams. No moderators were significant, indicating that these findings held true across school type, school level, and publication type. Although significant, the overall correlation was low; thus, practitioners are cautioned from using standardized English-language proficiency test scores in isolation in lieu of a holistic application review during the admissions process.

15 citations

Journal Article•10.1177/02655322221134218•
But who trains the language teacher educator who trains the language teacher? An empirical investigation of Chilean EFL teacher educators’ language assessment literacy

[...]

Salomé Villa Larenas, Tineke Brunfaut
27 Dec 2022-Language Testing
TL;DR: This article investigated the LAL of English as a Foreign Language teacher educators in Chile and found that five LAL components were identified (language assessment knowledge, conceptions, context, practices, and learning) and two by-products of LAL (language assessor identity and self-efficacy).
Abstract: Research has shown that language teachers typically feel underprepared for assessment aspects of their job. One reason may relate to how teacher education programmes prepare future teachers in this area. Research insights into how and to what extent teacher educators train future language teachers in language assessment matters are scarce, however, as are insights into the language assessment literacy (LAL) of the teacher educators themselves. Additionally, while increasingly research insights are available on components that constitute LAL, how such components interrelate is largely unexplored. To help address these research gaps, we investigated the LAL of English as a Foreign Language teacher educators in Chile. Through interviews with 20 teacher educators and analysis of their language assessment materials, five LAL components were identified (language assessment knowledge, conceptions, context, practices, and learning), and two by-products of LAL (language assessor identity and self-efficacy). The components were found to interrelate in a complex manner, which we visualized with a model of concentric oval shapes, depicting how LAL is socially constructed (and re-constructed) from and for the specific context in which teacher educators’ practices are immersed. We discuss implications for LAL conceptualisations and for LAL research methodology.

10 citations

Journal Article•10.1177/02655322221076024•
Developing a local academic English listening test using authentic unscripted audio-visual texts

[...]

Ye Na Park, Senyung Lee, Sun-Young Shin
24 Feb 2022-Language Testing
TL;DR: In this article , a listening test was developed using authentic unscripted audio-visual texts from the local target language use (TLU) domain without compromising the reliability of the test results and validity of the score interpretations.
Abstract: Despite consistent calls for authentic stimuli in listening tests for better construct representation, unscripted texts have been rarely adopted in high-stakes listening tests due to perceived inefficiency. This study details how a local academic listening test was developed using authentic unscripted audio-visual texts from the local target language use (TLU) domain without compromising the reliability of the test results and validity of the score interpretations. The purpose of the listening test was to identify international students who need additional language support at a U.S. university. We show that efficiency persists when using authentic unscripted texts that are representative of the local context both at the test development phase and at the classification phase where placement decisions are made in a dependable manner. Expert judgments highlighted the improved correspondence of the listening test using locally sourced audio-visual texts to the local TLU domain, providing additional support for using the listening test for local placement purposes. Additionally, dimensionality assessments demonstrated that test design decisions inevitably entailed with using authentic unscripted texts did not threaten the internal structure of the test. We argue that local resources are indispensable in developing authentic test stimuli and in supporting the validity of local test interpretation and use.

8 citations

Journal Article•10.1177/02655322221092392•
Local tests, local contexts

[...]

01 Jul 2022-Language Testing

8 citations

Journal Article•10.1177/02655322211070990•
Bridging local needs and national standards: Use of standards-based individualized feedback of an in-house EFL listening test in China

[...]

Shangchao Min, Juanjuan Zhang, Yue Li, Lianzhen He
28 Feb 2022-Language Testing
TL;DR: This article proposed a model to present the integration of national standards in local contexts to guide assessment, teaching, and learning, and conducted a 5-month longitudinal study involving 689 college students to examine the consistency between their internal and external assessment feedback (i.e., standards-based self-assessment ratings and standard-based individualized English as a Foreign Language [EFL] listening test feedback) and the effectiveness of teaching intervention in enhancing their perceived and actual language development.
Abstract: Local language tests are an arena where national language standards can be operationalized to create a hub for integrating assessment results and language support. Few studies, however, have examined the operationalization of national standards in local language assessment contexts. In this study, we proposed a model to present the integration of national standards in local contexts to guide assessment, teaching, and learning. Using this model, we conducted a 5-month longitudinal study involving 689 college students to examine (1) the consistency between their internal and external assessment feedback (i.e., standards-based self-assessment ratings and standards-based individualized English as a Foreign Language [EFL] listening test feedback) and (2) the effectiveness of standards-based teaching intervention in enhancing their perceived and actual language development. The results showed that the test feedback generally aligned well with students’ self-assessment and perceptions at the overall listening skill and subskill levels, yet student perceptions outlined needs for feedback refinements. In addition, the use of the standards-based individualized feedback, in conjunction with language support courses and practice materials, facilitated students’ perceived and actual listening achievement. This study makes an important contribution to local language testing by demonstrating the potential of a local instrument to provide a bridge between local instructional goals and national standards.

6 citations

Journal Article•10.1177/02655322221075067•
National assessment of foreign languages in Sweden: A multifaceted and collaborative venture

[...]

Gudrun Erickson, Linda Borger, Eva Olsson
02 Mar 2022-Language Testing
TL;DR: In this paper , the authors address the local system of national assessment of foreign languages in Sweden, a contextually specific, large-scale system with a summative aim, but also a system aimed to support teachers in their continuous assessment and grading of their students' competences.
Abstract: The article addresses the local system of national assessment of foreign languages in Sweden, a contextually specific, large-scale system with a summative aim, but also a system aimed to support teachers in their continuous assessment and grading of their students’ competences. In the text, the educational context and the multifaceted nature of national assessment are described and discussed. Furthermore, based on a broad view of validity with use and consequences in focus, different, and partly interwoven, aspects of collaboration in test development are exemplified and discussed, including policy, stakeholders, and research. Special attention is given to contributions by stakeholders, in particular students and teachers. Their involvement is regarded as a central component in the test development process, not only because it widens, deepens, and further develops the competences needed, but also because it increases the possibility to affect and enhance the use of the materials for the justice and beneficence of test-takers and society at large—aspects at the heart of validity. It is emphasized that collaboration requires sensitivity and sensibility from those involved to optimize overall quality and generate reciprocal benefits for all parties.

6 citations

Journal Article•10.1177/02655322211060076•
Roles of working memory, syllogistic inferencing ability, and linguistic knowledge on second language listening comprehension for passages of different lengths

[...]

Minkyung Kim, Yunjung Nam, Scott A. Crossley
29 Jan 2022-Language Testing
TL;DR: The authors investigated the effects of working memory capacity (WMC), first language (L1) syllogistic inferencing ability, and second-language (L2) linguistic knowledge on listening comprehension for passages of different lengths.
Abstract: This study investigated the effects of working memory capacity (WMC), first language (L1) syllogistic inferencing ability, and second-language (L2) linguistic knowledge on L2 listening comprehension for passages of different lengths. Participants were 193 Korean ninth-grade learners of English. A path analysis was used to examine multivariate relationships among variables. Findings indicated that L2 linguistic knowledge was pivotal in explaining L2 listening comprehension for passages of different lengths. Findings also indicated that over and above L2 linguistic knowledge, greater WMC facilitated comprehending longer L2 listening passages, while better L1 syllogistic inferencing ability facilitated comprehending shorter L2 listening passages. WMC may help form more information-dense representations of longer passages, while L1 syllogistic inferencing ability may help build integrated propositional representations of shorter passages. In addition, greater WMC had indirect impacts on L2 listening comprehension through L1 syllogistic inferencing ability and L2 linguistic knowledge, which suggests that WMC may lead to better L2 listening comprehension when learners have greater L2 knowledge or better L1 syllogistic inferencing ability. Overall, first, this study suggests a pivotal role for L2 linguistic knowledge in L2 listening comprehension and, second, roles for WMC and L1 syllogistic inferencing, which function differently depending on passage length.
Journal Article•10.1177/02655322211057868•
Register variation in spoken and written language use across technology-mediated and non-technology-mediated learning environments

[...]

Kristopher Kyle, Masaki Eguchi, Ann Tai Choe, Geoff LaFlair
20 Feb 2022-Language Testing
TL;DR: The authors investigated lexicogrammatical features of specific spoken and written registers across technology-mediated and non-technology-mediated learning environments, and found both similarities and substantive differences across the learning environments but did not investigate the effects of particular registers on these results.
Abstract: In the realm of language proficiency assessments, the domain description inference and the extrapolation inference are key components of a validity argument. Biber et al.’s description of the lexicogrammatical features of the spoken and written registers in the T2K-SWAL corpus has served as support for the TOEFL iBT test’s domain description and extrapolation inferences. In the time since the T2K-SWAL corpus was collected, however, university learning environments have increasingly become technology-mediated. Accordingly, any description of the linguistic features of university language should account for the language produced in technology-mediated learning environments (TMLEs) in addition to non-technology-mediated learning environments (non-TMLEs). Kyle et al. recently began to address this issue by collecting a corpus of TMLE language use, which they then compared to language use in non-TMLEs using multidimensional analysis (MDA). The results indicated both similarities and substantive differences across the learning environments, but the study did not investigate the effects of particular registers on these results. In this study, we build on previous research by investigating lexicogrammatical features of specific spoken and written registers across technology-mediated and non-technology-mediated learning environments.
Journal Article•10.1177/02655322221113917•
Comparing holistic and analytic marking methods in assessing speech act production in L2 Chinese

[...]

Shuai Li, Ting-hui Wen, Xian Li, Yali Feng, Chuan Lin 
09 Aug 2022-Language Testing
TL;DR: The authors compared holistic and analytic marking methods for their effects on parameter estimation (of examinees, raters, and items) and rater cognition in assessing speech act production in L2 Chinese.
Abstract: This study compared holistic and analytic marking methods for their effects on parameter estimation (of examinees, raters, and items) and rater cognition in assessing speech act production in L2 Chinese. Seventy American learners of Chinese completed an oral Discourse Completion Test assessing requests and refusals. Four first-language (L1) Chinese raters evaluated the examinees’ oral productions using two four-point rating scales. The holistic scale simultaneously included the following five dimensions: communicative function, prosody, fluency, appropriateness, and grammaticality; the analytic scale included sub-scales to examine each of the five dimensions. The raters scored the dataset twice with the two marking methods, respectively, and with counterbalanced order. They also verbalized their scoring rationale after performing each rating. Results revealed that both marking methods led to high reliability and produced scores with high correlation; however, analytic marking possessed better assessment quality in terms of higher reliability and measurement precision, higher percentages of Rasch model fit for examinees and items, and more balanced reference to rating criteria among raters during the scoring process.
Journal Article•10.1177/02655322211062138•
Psychometric approaches to analyzing C-tests

[...]

David Alpizar, Tongyun Li, John M. Norris, Lixiong Gu
28 Feb 2022-Language Testing
TL;DR: This article examined the local item independence assumption via multidimensional item response theory (IRT) models, Yen's Q3, and Jackknife Slope Index, and evaluated several IRT models to determine optimal approaches to scoring the C-test.
Abstract: The C-test is a type of gap-filling test designed to efficiently measure second language proficiency. The typical C-test consists of several short paragraphs with the second half of every second word deleted. The words with deleted parts are considered as items nested within the corresponding paragraph. Given this testlet structure, it is commonly taken for granted that the C-test design may violate the local independence assumption. However, this assumption has not been fully investigated in the C-test research to date, including the evaluation of alternative psychometric models (i.e., unidimensional and multidimensional) to calibrate and score the C-test. This study addressed each of these issues using a large data set of responses to an English-language C-test. First, we examined the local item independence assumption via multidimensional item response theory (IRT) models, Yen’s Q3, and Jackknife Slope Index. Second, we evaluated several IRT models to determine optimal approaches to scoring the C-test. The results support an interpretation of unidimensionality for the C-test items within a paragraph, with only minor evidence of local item dependence. Furthermore, the two-parameter logistic (2PL) IRT model was found to be the most appropriate model for calibrating and scoring the C-test. Implications for designing, scoring, and analyzing C-tests are discussed.
Journal Article•10.1177/02655322211063785•
Development and initial validation of productive vocabulary tests for isiZulu, Siswati and English in South Africa

[...]

Carien Wilsenach, Maxine N. Schaefer
08 Jan 2022-Language Testing
TL;DR: In this article , the development of corpus-informed contextually appropriate tests of productive vocabulary in isiZulu, Siswati, and English were used for a project evaluation.
Abstract: Multilingualism in education is encouraged in South Africa, and children are expected to become bilingual and biliterate during the early primary grades. Much focus has been placed on measuring literacy in children’s first language, often the medium of instruction (MOI), and English, the language typically used as MOI from fourth grade. However, vocabulary development in African contexts is underexplored, owing to the cost of existing English standardized tests, and the comparatively fewer linguistically and contextually appropriate vocabulary assessments in African languages. To address this gap, we document the development of corpus-informed contextually appropriate tests of productive vocabulary in isiZulu, Siswati, and English, which were used for a project evaluation. The initial validation phase included 412 children. Both tests were reliable and were concurrently validated with reading comprehension tests in each language, and oral language skills in English. This study contributes to our understanding of the factors that affect the variation in vocabulary knowledge in an African context, including age, grade repetition, and vocabulary in the other language. Only English vocabulary was affected by the remote rural location of the school. We recommend some modifications to the tests before they are validated further in other populations.
Journal Article•10.1177/02655322211068847•
IRT-based classification analysis of an English language reading proficiency subtest

[...]

Elif Kaya, Stefan O’Grady, Ilker Kalender
27 Jan 2022-Language Testing
TL;DR: The authors investigated the classification performance of CAT on the reading section of an English language proficiency test and made comparisons with the paper-based version of the same test and found that classification was suitable when a single cutoff score was used, particularly for high and low-ability test takers.
Abstract: Language proficiency testing serves an important function of classifying examinees into different categories of ability. However, misclassification is to some extent inevitable and may have important consequences for stakeholders. Recent research suggests that classification efficacy may be enhanced substantially using computerized adaptive testing (CAT). Using real data simulations, the current study investigated the classification performance of CAT on the reading section of an English language proficiency test and made comparisons with the paper-based version of the same test. Classification analysis was carried out to estimate classification accuracy (CA) and classification consistency (CC) by applying different locations and numbers of cutoff points. The results showed that classification was suitable when a single cutoff score was used, particularly for high- and low-ability test takers. Classification performance declined significantly when multiple cutoff points were simultaneously employed. Content analysis also raised important questions about construct coverage in CAT. The results highlight the potential for CAT to serve classification purposes and outline avenues for further research.
Journal Article•10.1177/02655322221113189•
Book Review: Challenges in Language Testing Around the World: Insights for Language Test Users

[...]

Atta Gebril
28 Jul 2022-Language Testing
TL;DR: This article presented challenges in language testing around the world: Insights for language test users by Betty Lanteigne, Christine Coombe, and James Dean Brown, which is a good addition to the existing body of knowledge since it offers a closer look at "things that could get overlooked, misapplied, misinterpreted, misused" in different assessment projects.
Abstract: With the increasing role of tests worldwide, language professionals and other stakeholders are regularly involved in a wide range of assessment-related decisions in their local contexts. Such decisions vary in terms of the stakes associated with them, with many involving high-stakes decisions. Regardless of the nature of the stakes, assessment contexts tend to share something in common: the challenges that test users encounter on a daily basis. To make things even worse, many test users operate in an instructional setting with little knowledge about assessment. Taylor (2009) refers to the lack of assessment literacy materials that are accessible to different stakeholders, arguing that such materials are “highly technical or too specialized for language educators seeking to understand basic principles and practice in assessment” (p. 23). On a related note, assessment literacy training tends to be offered in a one-size-fits-all manner and does not tap into the unique characteristics of local contexts. This view is in contradiction with what different researchers reported in the literature since assessment literacy is perceived as “a social and co-constructed construct,” “no longer viewed as passive accumulation of knowledge and skills” (Yan & Fan, 2021, p. 220), and tends to be impacted by a number of contextual factors, such as linguistic background and teaching experience (Crusan et al., 2016). In light of these issues, the current volume taps into the existing challenges in different assessment/instructional settings. It is rare in our field to find a volume dedicated mainly to challenges in different assessment/instructional settings. Usually, there is a general sense that practitioners do not prefer such a negative tone when reading or writing about language assessment practices. In addition, practitioners generally do not have the incentives and resources needed for publishing, nor do they have access to a suitable platform for sharing such experiences. Challenges in Language Testing Around the World: Insights for Language Test Users by Betty Lanteigne, Christine Coombe, and James Dean Brown is a good addition to the existing body of knowledge since it offers a closer look at “things that could get overlooked, misapplied, misinterpreted, misused” in different assessment projects (Lanteigne et al., 2021, p. v.). Another perspective that the authors have to be commended on is related to the international nature of the experiences reported in this volume. 1113189 LTJ0010.1177/02655322221113189Language TestingBook reviews book-reviews2022
Journal Article•10.1177/02655322211070840•
The use of generalizability theory in investigating the score dependability of classroom-based L2 reading assessment

[...]

Ray J. T. Liao
28 Feb 2022-Language Testing
TL;DR: In this paper , the authors investigated the score reliability of the MC format in classroom-based L2 reading tests and found that score reliability was critically influenced by the number of items and passages, inasmuch as a different combination of passages and items altered the degree of reliability.
Abstract: Among the variety of selected response formats used in L2 reading assessment, multiple-choice (MC) is the most commonly adopted, primarily due to its efficiency and objectiveness. Given the impact of assessment results on teaching and learning, it is necessary to investigate the degree to which the MC format reliably measures learners’ L2 reading comprehension in the classroom context. While researchers have claimed that the longer the reading test (i.e., more test items and passages), the higher its overall reliability, few studies have investigated the optimal number of items and passages required for reliable classroom-based L2 reading assessment. To address this research gap, I adopted generalizability (G) theory to investigate the score reliability of the MC format in classroom-based L2 reading tests. A total of 108 ESL students at an American college completed an English reading test that included four passages, each of which was accompanied by five MC comprehension questions. The results showed that the score reliability of the L2 reading test was critically influenced by the number of items and passages, inasmuch as a different combination of the number of passages and items altered the degree of reliability. Implications for practitioners and educational researchers are discussed.
Journal Article•10.1177/02655322221118069•
Book Review: Assessing Academic English for Higher Education Admissions

[...]

Shizheng Liu
24 Aug 2022-Language Testing
TL;DR: In this article , the authors present a review and a review of the work presented in this article. Buttikar et al. discuss the authors' review and their review.
Abstract: This article reviews and
Journal Article•10.1177/02655322221140331•
Book Review: The Routledge Handbook of Language Testing

[...]

John M. Norris
16 Dec 2022-Language Testing
TL;DR: The second edition of The Routledge Handbook of Language Testing, published in 2022, is a hefty volume, covering a broad swath of theory, research, and practice in language testing over some 600 + pages as mentioned in this paper .
Abstract: The second edition of The Routledge Handbook of Language Testing, published in 2022, is a hefty volume, covering a broad swath of theory, research, and practice in language testing over some 600 + pages. Editors Glenn Fulcher and Luke Harding have done a nice job of updating the first edition, bringing in a handful of new contributions for a total of 36 chapters, re-arranging the organization somewhat to collocate topics thematically, and encouraging revisions to nearly all of the included chapters. Compiling edited volumes, never mind substantial handbooks that are intended to reflect the entire field, like this one, is never an easy or straightforward endeavor. Choices inevitably must be made about which experts to invite, what topics to include and which ones to leave out, and how to arrange the contents and situate the contributions against the backdrop of an active and evolving domain of research and practice. On the whole, this book does a good job of reflecting a lot of what is on the minds of language testing researchers and practitioners as they go about the scholarship and business of language assessment, and it does so in a reader-friendly way, with relatively brief and consistently organized chapters produced by an impressive group of experts. I believe these characteristics recommend the book for use in seminars on language testing and as an authoritative reference for a variety of language testing stakeholders—indeed, many of these chapters will help in the cause of advancing language assessment literacy in multiple sectors (if we can only encourage their being read by individuals in those sectors . . .). In the following, I highlight a few dimensions of the volume that I find particularly useful and/or insightful, and I offer some observations on aspects that might have deserved more attention or perhaps should merit attention in the next edition. The book is arranged in 10 topical sections with three to five chapters each, fronted by a brief editorial introduction and ending with a subject and author index. In the introduction, the editors do a nice job of rationalizing the different sections of the book and introducing the key contributions of the distinct chapters. They also effectively link core ideas and themes that transcend individual chapters, thereby helping readers to notice important threads that connect the different perspectives and issues covered. Dispensing with one production quibble up front, the Index is not well compiled. While no doubt a challenge with so many contributing authors and such wide-ranging contents, a good index is all the more important for a big book like this one. Yet this index has numerous 1140331 LTJ0010.1177/02655322221140331Language TestingBook Reviews research-article2022
Journal Article•10.1177/02655322221114895•
Book Review: Multilingual Testing and Assessment

[...]

Beverly Baker
15 Aug 2022-Language Testing
TL;DR: From both a theoretical and an empirical perspective, this paper addressed the challenges of testing learners of multiple school languages (i.e., learners of more than two languages) and provided help and guidance to all those who work in education with multilingual populations.
Abstract: From both a theoretical and an empirical perspective, this volume addresses the challenges of testing learners of multiple school language(s). The author states that “This volume is intended as a non-technical resource to offer help and guidance to all those who work in education with multilingual populations” (p. 1). In that sense, it is not a book about the assessment of language per se (although she presents a research study in which she collects information on students’ language proficiency). Rather, it is intended primarily for non-language specialists; to educators working with multilingual learners of all subjects. As she states throughout the work, the author addresses what she sees as limitations in both theoretical and empirical work that consider two languages only, claiming that this work has limited insights for those working with speakers of more than two languages. The author is motivated by the fair assessment of all students, including linguistically and culturally minoritized students. What follows are a summary and critical comments of the book, beginning with an overview of each of the chapters then directing a critical commentary to a few chapters in particular (Chapters 2, 5, and 7). Given the repetition of ideas across the chapters, I assume that many of these chapters have been designed to be read on a stand-alone basis. I have chosen these chapters to focus my comments because in my view they form the core of the book—they contain the theoretical approach undergirding the author’s work, the practical guidance in the form of the author’s “integrated approach,” and the details of her empirical study.
Journal Article•10.1177/02655322221092388•
A sequential approach to detecting differential rater functioning in sparse rater-mediated assessment networks

[...]

Stefanie A. Wind
12 May 2022-Language Testing
TL;DR: In this paper , a sequential method is adapted from previous research on differential item functioning (DIF) that allows researchers to detect DRF more accurately and distinguish between true and artificial DRF.
Abstract: Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting DRF may be limited in sparse rating designs, where it is not possible for every rater to score every student. In these designs, there is limited information with which to detect DRF. Sparse designs can also exacerbate the impact of artificial DRF, which occurs when raters are inaccurately flagged for DRF due to statistical artifacts. In this study, a sequential method is adapted from previous research on differential item functioning (DIF) that allows researchers to detect DRF more accurately and distinguish between true and artificial DRF. Analyses of data from a rater-mediated writing assessment and a simulation study demonstrate that the sequential approach results in different conclusions about which raters exhibit DRF. Moreover, the simulation study results suggest that the sequential procedure results in improved accuracy in DRF detection across a variety of rating design conditions. Practical implications for language testing research are discussed.
Journal Article•10.1177/02655322221085216•
Book review: Another Generation of Fundamental Considerations in Language Assessment: A Festschrift in Honor of Lyle F. Bachman

[...]

Ying Xu, Xiaodong Li
01 Apr 2022-Language Testing
Journal Article•10.1177/02655322221077081•
Book Review: Assessing Speaking in Context—Expanding the Construct and its Applications

[...]

Lynda Taylor
16 Feb 2022-Language Testing
Journal Article•10.1177/02655322221122774•
Challenges in rating signed production: A mixed-methods study of a Swiss German Sign Language form-recall vocabulary test

[...]

Aaron Olaf Batty, Tobias Haug, Sarah Ebling, Katja Tissi, Sandra Sidler-Miserez 
21 Sep 2022-Language Testing
TL;DR: In this article , a mixed-methods study of a human-rated form-recall sign vocabulary test of 98 signs for beginning adult learners of Swiss German Sign Language (DSGS), using post-test qualitative rater interviews, is presented.
Abstract: Sign languages present particular challenges to language assessors in relation to variation in signs, weakly defined citation forms, and a general lack of standard-setting work even in long-established measures of productive sign proficiency. The present article addresses and explores these issues via a mixed-methods study of a human-rated form-recall sign vocabulary test of 98 signs for beginning adult learners of Swiss German Sign Language (DSGS), using post-test qualitative rater interviews to inform interpretation of the results of quantitative analysis of the test ratings using many-facets Rasch measurement. Significant differences between two expert raters were observed on three signs. The follow-up interview revealed disagreement on the criterion of correctness, despite the raters’ involvement in the development of the base lexicon of signs. The findings highlight the challenges of using human ratings to assess the production not only of sign language vocabulary, but of minority languages generally, and underscore the need for greater effort expended on the standardization of sign language assessment.
Journal Article•10.1177/02655322221140012•
Book Review: An Introduction to the Rasch Model with Examples in R

[...]

Zhiqing Lin, Huilin Chen
16 Dec 2022-Language Testing
Journal Article•10.1177/02655322221080896•
Book Review: Scoring Second Language Spoken and Written Performance: Issues, Options and Directions

[...]

03 Mar 2022-Language Testing
Journal Article•10.1177/02655322221100115•
Who succeeds and who fails? Exploring the role of background variables in explaining the outcomes of L2 language tests

[...]

Ann-Kristin Helland Gujord
24 Jul 2022-Language Testing
TL;DR: This paper explored whether and to what extent the background information supplied by 10,155 immigrants who took an official language test in Norwegian affected their chances of passing one, two, or all three parts of the test.
Abstract: This study explores whether and to what extent the background information supplied by 10,155 immigrants who took an official language test in Norwegian affected their chances of passing one, two, or all three parts of the test. The background information included in the analysis was prior education, region (location of their home country), language (first language [L1] background, knowledge of English), second language (hours of second language [L2] instruction, L2 use), L1 community (years of residence, contact with L1 speakers), age, and gender. An ordered logistic regression analysis revealed that eight of the hypothesised explanatory variables significantly impacted the dependent variable (test result). Several of the significant variables relate to pre-immigration conditions, such as educational opportunities earlier in life. The findings have implications for language testing and also, to some extent, for the understanding of variation in learning outcomes.
Journal Article•10.1177/02655322211053212•
Innovation and expansion in Language Testing for changing times

[...]

Luke Harding, Paula Winke
01 Jan 2022-Language Testing
Journal Article•10.1177/02655322221114015•
L2 English vocabulary breadth and knowledge of derivational morphology: One or two constructs?

[...]

Dmitri Leontjev, Ari-Pekka Huhta, Asko Tolvanen
02 Sep 2022-Language Testing
TL;DR: This article conducted two confirmatory factor analyses, one with one underlying factor and the other treating vocabulary breadth and derivational morphology as separate, and found that learners' vocabulary breadth factor still explained a substantial amount of variance in learners' performance on DM measures.
Abstract: Derivational morphology (DM) and how it can be assessed have been investigated relatively rarely in language learning and testing research. The goal of this study is to add to the understanding of the nature of DM knowledge, exploring whether and how it is separable from vocabulary breadth. Eight L2 (second or foreign language) English DM knowledge measures and three measures of the size of the English vocabulary were administered to 120 learners. We conducted two confirmatory factor analyses, one with one underlying factor and the other treating vocabulary breadth and DM as separate. As neither model had a satisfactory fit without introducing a residual covariance to the two-factor model, we conducted an exploratory factor analysis, which suggested two separate DM factors in addition to vocabulary breadth. Regardless, the analysis demonstrated that the DM knowledge was separate from learners’ vocabulary breadth. However, learners’ vocabulary breadth factor still explained a substantial amount of variance in learners’ performance on DM measures. We discuss theoretical implications and implications for L2 assessment.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve