Validity
Accurate assessments will improve the quality of decisions whereas inaccurate assessments will do the opposite. And that’s what validity is about.
Validity refers to the degree to which a test measures what it purports to measure. If a test truly measures what it sets out to measure, then it’s likely the inferences that we make about students based on their test performances will be valid because we will interpret students’ performances according to what the test’s developers set out to measure.
There are three considerations in validity, they are:
- Content-Related
- Criterion-Related
- Construct-Related
Content-Related Evidence
Content-related evidence of validity (often referred simply as content validity) refers to the adequacy with which the content of a test represents the content of the assessment domain about which inferences are to be made. When the idea of content representatives was first dealt with by educational measurement folks several decades ago, the focus was dominantly on achievement examinations.
These days, however, the notion of “content” refers to much more than factual knowledge. The content of assessment domains in which educators are interested can embrace knowledge (such as historical facts), skills (such as higher-order thinking competencies), or attitudes (such as students’ dispositions toward the study of science). Content, therefore, should be conceived of broadly. When we determine the content representativeness of a test, the content in the assessment domain being sampled can consist of whatever is in that domain.
How do educators go about gathering content-related evidence of validity? Well, there are generally two approaches to follow in doing so. They are developmental care and external reviews.
1. Developmental Care
One way of trying to make sure that a test’s content adequately taps the content in the assessment domain the test is representing is to employ a set of test-development procedures carefully focused on assuring that the assessment domain’s content is properly reflected in the assessment procedure itself. The higher the stakes associated with the test’s use, the more effort is typically devoted to making certain the assessment procedure’s content properly represents the content in the assessment domain.
As you can see, the important consideration here is that the teacher makes a careful effort to conceptualize an assessment domain, then tries to see if the test being constructed actually contains content that is appropriately representative of the content in the assessment domain. Unfortunately, many teachers generate tests without any regard whatsoever for assessment domains.
2. External Reviews
A second form of content-related evidence of validity for educational assessment procedures involves the assembly of judges who rate the content appropriateness of a given test in relationship to the assessment domain the test allegedly represents.
Criterion-Related Evidence
This kind of evidence helps educators decide how much confidence can be placed in a score-based inference about a student’s status with respect to an assessment domain. Yet, a decisively different evidence-collection strategy is used when we gather criterion-related evidence of validity. Moreover, criterion-related evidence of validity is collected only in situations where educators are using an assessment procedure to predict how well students will perform on some subsequent criterion.
If we know a predictor test is working pretty well, we can use its results to help us make educational decisions about students. Test results, on predictor tests as well as on any educational assessment procedures, should be used to make better educational decisions.
Construct-Related Evidence
The last of our three varieties of validity evidence is the most esoteric, yet the most comprehensive. It is referred to as construct-related evidence of validity. It was originally used when psychologists were trying to measure such elusive and covert constructs as individuals’ “anxiety” and “perseverance tendencies”.
There are three types of strategies most commonly used in construct-related evidence. They are Intervention studies, differential-population studies, and related-measured studies.
1. Intervention Studies
In an intervention study, we hypothesize that students will respond differently to the assessment instrument after having received some type of treatment (or intervention).
2. Differential-Population Studies
In this kind of study, based on our knowledge of the construct being measured, we hypothesize that individuals representing distinctly different populations will score differently on the assessment procedure under consideration.
3. Related-Measured Studies
In a related-measured study, we hypothesize that a given kind of relationship will be present between students’ scores on the assessment device we are scrutinizing and their scores on a related assessment device.
So we can conclude that validity refers to the adequacy and appropriateness of the interpretations made from assessments, with regard to a particular use. The relation between validity and reliability is sometimes confusing. But, we need to emphasize here that reliability is needed to obtain valid results, but we can have reliability without validity. The distinction between the two concepts is reliability is concerned with the consistency of the assessment results whereas validity is concerned with the appropriateness of the interpretations made from the results.
No comments:
Post a Comment