TL;DR: Measurement in the Broader Research Context Before the Scale Development After the Scale Administration Final Thoughts References Index about the Author.
Abstract: Chapter 1: Overview General Perspectives on Measurement Historical Origins of Measurement in Social Science Later Developments in Measurement The Role of Measurement in the Social Sciences Summary and Preview Chapter 2: Understanding the "Latent Variable" Constructs Versus Measures Latent Variable as the Presumed Cause of Item Values Path Diagrams Further Elaboration of the Measurement Model Parallel "Tests" Alternative Models Exercises Chapter 3: Reliability Continuous Versus Dichotomous Items Internal Consistency Relability Based on Correlations Between Scale Scores Generalizability Theory Summary and Exercises Chapter 4: Validity Content Validity Criterion-related Validity Construct Validity What About Face Validity? Exercises Chapter 5: Guidelines in Scale Development Step 1: Determine Clearly What it Is You Want to Measure Step 2: Generate an Item Pool Step 3: Determine the Format for Measurement Step 4: Have Initial Item Pool Reviewed by Experts Step 5: Consider Inclusion of Validation Items Step 6: Administer Items to a Development Sample Step 7: Evaluate the Items Step 8: Optimize Scale Length Exercises Chapter 6: Factor Analysis Overview of Factor Analysis Conceptual Description of Factor Analysis Interpreting Factors Principal Components vs Common Factors Confirmatory Factor Analysis Using Factor Analysis in Scale Development Sample Size Conclusion Chapter 7: An Overview of Item Response Theory Item Difficulty Item Discrimination False Positives Item Characteristic Curves Complexities of IRT When to Use IRT Conclusions Chapter 8: Measurement in the Broader Research Context Before the Scale Development After the Scale Administration Final Thoughts References Index About the Author
TL;DR: In this paper, the authors discuss the role of measurement in the social sciences and propose guidelines for scale development in the context of scale-based measurement. But, the authors do not discuss the relationship between scale scores and scale length.
Abstract: Chapter 1: Overview General Perspectives on Measurement Historical Origins of Measurement in Social Science Later Developments in Measurement The Role of Measurement in the Social Sciences Summary and Preview Chapter 2: Understanding the "Latent Variable" Constructs Versus Measures Latent Variable as the Presumed Cause of Item Values Path Diagrams Further Elaboration of the Measurement Model Parallel "Tests" Alternative Models Exercises Chapter 3: Reliability Continuous Versus Dichotomous Items Internal Consistency Relability Based on Correlations Between Scale Scores Generalizability Theory Summary and Exercises Chapter 4: Validity Content Validity Criterion-related Validity Construct Validity What About Face Validity? Exercises Chapter 5: Guidelines in Scale Development Step 1: Determine Clearly What it Is You Want to Measure Step 2: Generate an Item Pool Step 3: Determine the Format for Measurement Step 4: Have Initial Item Pool Reviewed by Experts Step 5: Consider Inclusion of Validation Items Step 6: Administer Items to a Development Sample Step 7: Evaluate the Items Step 8: Optimize Scale Length Exercises Chapter 6: Factor Analysis Overview of Factor Analysis Conceptual Description of Factor Analysis Interpreting Factors Principal Components vs Common Factors Confirmatory Factor Analysis Using Factor Analysis in Scale Development Sample Size Conclusion Chapter 7: An Overview of Item Response Theory Item Difficulty Item Discrimination False Positives Item Characteristic Curves Complexities of IRT When to Use IRT Conclusions Chapter 8: Measurement in the Broader Research Context Before the Scale Development After the Scale Administration Final Thoughts References Index About the Author
TL;DR: It is found that both methods of computing the scale-level index (S-CVI) are being used by nurse researchers, although it was not always possible to infer the calculation method.
Abstract: Scale developers often provide evidence of content validity by computing a content validity index (CVI), using ratings of item relevance by content experts. We analyzed how nurse researchers have defined and calculated the CVI, and found considerable consistency for item-level CVIs (I-CVIs). However, there are two alternative, but unacknowledged, methods of computing the scale-level index (S-CVI). One method requires universal agreement among experts, but a less conservative method averages the item-level CVIs. Using backward inference with a purposive sample of scale development studies, we found that both methods are being used by nurse researchers, although it was not always possible to infer the calculation method. The two approaches can lead to different values, making it risky to draw conclusions about content validity. Scale developers should indicate which method was used to provide readers with interpretable content validity information.
TL;DR: This book explains how a general model for explaining performance on language tests Apologia et prolegomenon, a theoretical framework of communicative language ability, applies to language testing.
Abstract: Preface 1. Introduction The aims of the book The climate for language testing Research and development: needs and problems Research and development: an agenda Overview of the book Notes 2. Measurement Introduction Definition of terms: measurement, test, evaluation Essential measurement qualities Properties of measurement scales Characteristics that limit measurement Steps in measurement Summary Notes Further reading Discussion questions 3. Uses of Language Tests Introduction Uses of language tests in educational programs Research uses of language tests Features for classifying different types of language test Summary Further reading Discussion questions 4. Communicative Language Ability Introduction Language proficiency and communicative competence A theoretical framework of communicative language ability Summary Notes Further reading Discussion questions 5. Test Methods Introduction A framework of test method facets Applications of this framework to language testing Summary Notes Further reading Discussion questions 6. Reliability Introduction Factors that affect language test scores Classical true score measurement theory Generalizability theory Standard error of measurement: Interpreting individual test scores within classical true score and generalizability theory Item response theory Reliability of criterion-referenced test scores Factors that affect reliability estimates Systematic measurement error Summary Notes Further reading Discussion questions 7. Validation Introduction Reliability and validity revisited Validity as a unitary concept The evidential basis of validity Test bias The consequential or ethical basis of validity Post mortem: face validity Summary Notes Further reading Discussion questions 8. Some Persistent Problems and Future Directions Introduction Authentic language tests Some future directions A general model for explaining performance on language tests Apologia et prolegomenon Summary Notes Further reading Discussion questions Bibliography Author index Subject index
TL;DR: The resulting COSMIN checklist could be useful when selecting a measurement instrument, peer-reviewing a manuscript, designing or reporting a study on measurement properties, or for educational purposes.
Abstract: Aim of the COSMIN study (COnsensus-based Standards for the selection of health status Measurement INstruments) was to develop a consensus-based checklist to evaluate the methodological quality of studies on measurement properties. We present the COSMIN checklist and the agreement of the panel on the items of the checklist. A four-round Delphi study was performed with international experts (psychologists, epidemiologists, statisticians and clinicians). Of the 91 invited experts, 57 agreed to participate (63%). Panel members were asked to rate their (dis)agreement with each proposal on a five-point scale. Consensus was considered to be reached when at least 67% of the panel members indicated ‘agree’ or ‘strongly agree’. Consensus was reached on the inclusion of the following measurement properties: internal consistency, reliability, measurement error, content validity (including face validity), construct validity (including structural validity, hypotheses testing and cross-cultural validity), criterion validity, responsiveness, and interpretability. The latter was not considered a measurement property. The panel also reached consensus on how these properties should be assessed. The resulting COSMIN checklist could be useful when selecting a measurement instrument, peer-reviewing a manuscript, designing or reporting a study on measurement properties, or for educational purposes.