What Should Be an Effective Assessment Validity, Reliability, And Stability

Afza.Malik GDA

Effective Data Assessment in Nursing 

Validity, Reliability, And Stability Assessment Validity

Validity, Reliability, And Stability Assessment Validity

 Effective Assessment in Health Care

    Assessment experts increasingly suggest that in addition to collecting evidence to support the accuracy of inferences made, evidence also should be collected about the intended and unintended consequences of the use of a test (Brookhart & Nitko, 2019; Goodwin, 1997; Goodwin & Goodwin, 1999).Validity does not exist on an all-or-none basis (Miller et al., 2013); there are degrees of validity depending on the purpose of the assessment and how the results are to be used.

    A given assessment may be used for many different purposes, and inferences about the results may have greater validity for one purpose than for another. For example, a test designed to measure knowledge of perioperative nursing guidelines may produce results that have high validity for the purpose of determining certification for perioperative staff nurses, but the results may have low validity for assigning grades to students in a perioperative nursing elective course.

     Addition, validity evidence may change over time, so validation of inferences must not be considered a one-time event. No one assessment will produce results that are perfectly valid for a given purpose. Combining results from several different types of assessments, such as tests, written assignments, and class participation, improves the validity of the decisions made about students’ attainments.

    In addition, weighing one assessment outcome too heavily in relation to others, such as basing course grades almost exclusively on test scores, results in lowered validity (Brookhart & Nitko, 2019).Validity now is considered a unitary concept (Brookhart & Nitko, 2019; Miller et al., 2013).

    The concept of validity in testing is described in the Standards for Educational and Psychological Testing prepared by a joint committee of the American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME). The most recent Standards (2014) no longer includes the view that there are different types of validity for example, construct, criterion-related, and content.

What Should Be an Effective Assessment:

     How does a teacher know whether a test or another assessment instrument is good?If assessment results will be used to make important educational decisions, such as assigning grades and determining whether students are eligible for graduation,teachers must have confidence in their interpretations of test scores.

    Some high-stakes educational decisions have consequences for faculty members and administrators as well as students. Good assessments produce results that can be used to make appropriate inferences about learners’ knowledge and abilities and thus facilitate effective decision-making. In addition, assessment tools should be practical and easy to use.

Two important questions have been posed to guide the process of constructing or

proposing tests and other assessments:

    1. To what extent will the interpretation of the scores be appropriate, meaningful, and useful for the intended application of the results?

    2. What are the consequences of the particular uses and interpretations that are made of the results (Miller, Linn, & Gronlund, 2013, p. 70)? This chapter explains the concept of assessment validity, the role of reliability, and their effects on the interpretive quality of assessment results. It also discusses important practical considerations that might affect the choice or development of tests and other instruments.

    Definitions of validity have changed over time. Early definitions, formed in the 1940s and early 1950s, emphasized the validity of an assessment tool itself. Tests were characterized as valid or not, apart from consideration of how they were used. It was common in that era to support a claim of validity with evidence that a test correlated well with another “true” criterion.

    The concept of validity changed, however, in the 1950s through the 1970s to focus on evidence that an assessment tool is valid for a specific purpose. Most measurement textbooks of that era classified validity by three types content, criterion-related, and construct and suggested that validation of a test should include more than one approach.

    In the 1980s, the understanding of validity shifted again, to an emphasis on providing evidence to support the particular inferences that teachers make from assessment results. Validity was defined in terms of the appropriateness and usefulness of the inferences made from assessments, and assessment validation was seen as a process of collecting evidence to support those inferences.

    The usefulness of the validity “triad” also was questioned; increasingly,measurement experts recognized that construct validity was the key element and unifying concept of validity (Goodwin, 1997; Goodwin & Goodwin, 1999).The current philosophy of validity continues to focus not on assessment tools themselves or on the appropriateness of using a test for a specific purpose, but on the meaningfulness of the interpretations that teachers make of assessment results.

    Tests and other assessment instruments yield scores that teachers use to make inferences about how much learners know or what they can do. Validity refers to the adequacy and appropriateness of those interpretations and inferences and how the assessment  results are used (Miller et al., 2013). The emphasis is on the consequences of measurement: Does the teacher make accurate interpretations about learners’ knowledge or ability based on their assessment scores?


    Instead, there is a variety of sources of evidence to support the validity of the interpretation and use of assessment results. The strongest case for validity can be made when evidence is collected regarding four major considerations for validation:

1. Content

2. Construct

3. Assessment–criterion relationships

4. Consequences

(Miller et al., 2013)

Post a Comment


Give your opinion if have any.

Post a Comment (0)

#buttons=(Ok, Go it!) #days=(20)

Our website uses cookies to enhance your experience. Check Now
Ok, Go it!