# Test Evaluation Nursing Education

# For **Interpreting test scores in test evaluation measures** of **central tendency** and** variability** also aid in **interpreting individual scores**.

As a measurement tool, a test results in a score. However, a number has no intrinsic meaning and must be compared to something that has meaning in order to interpret its meaning. For a test result to be useful in making decisions about the test, the teacher must interpret the result. Whether interpretations refer to norms or criteria, a basic understanding of statistical concepts is required to assess the quality of tests (whether created by teachers or published), understand standardized test scores, summarize test scores and assessments, and explain test scores to others .

## Test Score Division

Some information about how a test is administered as a measurement tool can be obtained from computer-generated test and item analysis reports. In addition to providing item analysis data such as difficulty and discrimination indices, these reports often summarize the characteristics of the point distribution. If the teacher does not have access to electronic assessment and computer software for test and item analysis, much of this analysis can be done by hand, albeit more slowly. When a test is graded, the teacher is left with a collection of raw scores. These results are often recorded by student name, in alphabetical order, or by student number.

**1. Did a majority of students obtain high or low scores on the test?**

**2. Did any individuals score much higher or much lower than the majority of the students?**

**3. Are the scores widely scattered or grouped together?**

**4. What was the range of scores obtained by the majority of the students? (Brookhart & Nitko, 2019)**

To make it easier to identify similar features of the outcomes, the teacher should **rank **them from **highest to lowest** (Miller, Linn, & Gronlund, 2013). By arranging the scores in this way, it is apparent that they ranged from 42 to 60, with one student doing much worse than the other students. But the teacher still cannot easily imagine how a typical student performed on the test or what are the general characteristics of the results obtained.

**Removing the student names, listing each score once, and counting the number of times each score occurs results in a frequency distribution, scores.** This makes it easier for the teacher to see how well the group of students did on the score. In order to accurately interpret test results, the teacher must analyze test performance as a whole as well as individual test items. Information about how the test was administered helps teachers provide students with feedback on test results and improve test items for future use.

A test result is a collection of numbers called raw scores. To make raw values understandable, they can be organized into frequency distributions or plotted as histograms or frequency polygons. Features of the score distribution, such as symmetry, skewness, modality, and kurtosis, can help the teacher understand how the test performs as a measure and help interpret any scores in the distribution.

Measurements of central tendency and variability also aid in the interpretation of individual values. Measures of central tendency include mode, median, and mean; each measure has advantages and disadvantages for its use. In a normal distribution, these three measures coincide. M**ost test score distributions created by teachers do not meet the assumptions of a normal curve. The shape of the distribution can determine the most appropriate central propensity to consume index**.

The variability in a distribution can be roughly described as the range of values or more precisely as the standard deviation. Teachers can make criteria-based or norm-based interpretations of individual student results. Norm-related interpretations of an individual score must take into account features of the score distribution, some index of central tendency, and some index of variability. Therefore, the teacher can use the mean and standard deviation to assess how an individual student's score compares to others.

The percent correct score is calculated by dividing the raw score by the total possible score; Therefore, it compares the student's score against a pre-established standard or criterion. A percentage correct is not an objective indication of how much a student actually knows about a topic, as this is affected by the difficulty of the test items. Percentage benchmark should not be confused with percentile rank, which describes a student's relative position within a group and is therefore a norm-based interpretation.

The percentile rank of a given raw score is the percentage of scores in the distribution that occur at or below that score. Standardized test results are usually reported as percentile ranks or other values related to the norm. Teachers should be cautious when interpreting standardized test scores to make comparisons with the appropriate normative group. Standardized test scores should not be used to determine grades, and scores should be interpreted with the understanding that only large differences in scores indicate true differences in proficiency levels.

Item analysis is usually done through the use of a computer program, either as part of a test scoring application or computer testing software. The difficulty index (P), which ranges from 0 to 1.00, indicates the percentage of students who answered the item correctly. Items with P-values of 0.20 or less are considered difficult, and items with P-values of 0.80 or greater are considered easy.

However, when interpreting the level of difficulty, the quality of the instruction and the abilities of the students in the group must be taken into account. The discrimination index (D), which ranges from −1.00 to +1.00, indicates the extent to which students with high scores answered the item correctly more often than students with low scores. In general, the larger the positive value, the better the test object; desirable discrimination indices should be at least +0.20.

The distinctive character of an element depends largely on its difficulty rating. An item that is answered correctly by all students has a difficulty index of 1.00; The discrimination index for this item is 0 because there is no difference in performance between high and low scorers on this item. Errors in test construction can have different effects on student results and therefore need to be treated differently.

If the correct answer to a multiple choice item is accidentally omitted from the test, no student will be able to answer the item correctly. In this case, the item simply should not be scored. If an error consists of a misspelled word that does not change the meaning of the element, no adjustment should be made. Teachers should develop a system for maintaining a pool or bank of items from which to select items for future testing. Item banks are often a feature of computer testing programs, or may be developed by teachers and stored electronically.

The use of published test task banks should be based on the teacher's assessment of the quality of the tasks, as well as the purpose of the test, the relevant characteristics of the students, and the desired emphasis and balance of content as reflected in teacher evaluation. test plan. Items selected from a published item bank often need to be modified to be technically sound and relevant to the way the content area was taught and to the characteristics of the students being assessed.

Give your opinion if have any.