and Test Construction
ProblemI am a graduate student who is conducting a research project for my thesis. I can't wait to graduate! I would like to find out whether my instrument is reliable in order to proceed with my experiment. I heard about using alternate forms and test-retest to estimate reliability. But due to lack of resources, I cannot afford to write two tests or administer the same test in two different times. With only one test result, what should I do to evaluate the reliability of my measurement tool?
Which reliability coefficients should I use?You may compute Cronbach Coefficient Alpha, Kuder Richardson (KR) Formula, or Spilt-half Reliability Coefficient to check for the internal consistency within a single test. Cronbach Alpha is recommended over the other two for the following reasons:
What is Cronbach Alpha Coefficient?"OK, Cronbach Alpha is good. But what is Cronbach Alpha?" Cronbach Alpha coefficient is invented by Professor Cronbach, of course. It is a measure of squared correlation between observed scores and true scores. Put another way, reliability is measured in terms of the ratio of true score variance to observed score variance. "Wow! Sound very technical. My committee will like that. But what does it mean?"
The theory behind it is that the observed score is equal to the true score plus the measurement error (Y = T + E). For example, I know 80% of the materials but my score is 85% because of lucky guessing. In this case, my observed score is 85 while my true score is 80. The additional five points are due to the measurement error. A reliable test should minimize the measurement error so that the error is not highly correlated with the true score. On the other hand, the relationship between true score and observed score should be strong. Cronbach Alpha examines this relationship.
How to compute Cronbach AlphaEither SAS or SPSS can perform this analysis. SAS is a better choice due to its better detail. The following illustration is based upon the data of the Eruditio project, which is sponsored by U.S. West Communications. The SAS syntax to run Cronbach Alpha is as the following:
In this example, the "nocorr" option suppresses the item correlation information. Although the correlation matrix can be used to examine whether particular items are negatively correlated with others, a more efficient way is to check the table entitled "if items are deletedíK" This table tells you whether particular items are negatively correlated with the total and thus it is recommended to suppress the correlation matrix from the output. "If items are deletedíK" will be explained in a later section.
It is important to include the "nomiss" option in the procedure statement. If the test taker or the survey participant did not answer several questions, Cronbach Alpha will not be computed. In surveys, it is not unusual for respondents to skip questions that they don't want to answer. Also, if you use a scanning device to record responses, slight pencil marks may not be detected by the scanner. In both cases, you will have "holes" in your data set and Cronbach Alpha procedure will be halted. To prevent this problem from happening, the "nomiss" option tells SAS to ignore cases that have missing values.
However, in the preceding aproach, even if the test taker or the survey participant skips one question, his entire test will be ignored by SAS. In a speeded test where test taker or the survey participants may not be able to complete all items, the use of "nomiss" will lead to some loss of information. One way to overcome this problem is to set a criterion for a valid test response. Assume that 80 percent of test items must be answered in order to be included into the analysis, the following SAS code should be implemented:
In the preceding SAS code, if a record has more than one unanswered questions (80%), the record will be deleted. In the remaining records, the missing values will be replaced by a zero and thus these records will be counted into the analysis.
It is acceptable to count missing responses of a test as wrong answers and assign a value of "zero" to them. But it is not appropriate to do so if the instrument is a survey such as an attitude scale. One of the popular approaches for dealing with missing data in surveys is the mean replacement method (Afifi ∓ Elashoff, 1966), in which means are used to replace missing data. The SAS source code for the replacement is the same as the preceding one except the following line:
How to interpret the SAS output
Descriptive statisticsThe mean output as shown below tells you how difficult the items are. Because in this case the answer is either right (1) or wrong (0). The mean is ranging from 0 to 1. 0.9 indicates that the question is fairly easy and thus 90% of the test taker or the survey participants scored it. It is a common mistake that people look at each item individually and throw out the item that appears to be too difficult or too easy. Actually you should take the entire test into consideration. This will be discussed later.
Raw and standardized Cronbach Coefficient Alpha
Cronbach Alpha procedure returns two coefficients:
The higher the Alpha is, the more reliable the test is. There isn't a generally agreed cut-off. Usually 0.7 and above is acceptable (Nunnally, 1978). It is a common misconception that if the Alpha is low, it must be a bad test. Actually your test may measure several attributes/dimensions rather than one and thus the Cronbach Alpha is deflated. For example, it is expected that the scores of GRE-Verbal, GRE-Quantitative, and GRE-Analytical may not be highly correlated because they evaluate different types of knowledge.
If your test is not internally consistent, you may want to perform factor analysis to combine items into a few factors. You may also drop the items that affect the overall consistency, which will be discussed next.
It is very important to notice that Cronach Alpha takes variance (spread of the distribution) into account. For example, when you compare the mean scores in the following two tables, you can find that both pre-test and post-test responses are consistent, respectively. However, the Alpha of post-test is only .30 (raw) and .29 (standardized) while the Alpha of pre-test is as high as .60 (raw and standardized). It is because the standard deviation (SD) of the post-test ranges from .17 to .28 but the SD of the pre-test is more consistent (.42-.48).
If the item is deleted...As I mentioned before, a good analysis of test items should take the whole test into consideration. The following table tells you how each item is correlated with the entire test and what the Alpha will be if that variable is deleted. For example, the first line shows you the correlation coefficient between post-test item 1 and the composite score of post-test item1-item5. The first item is negatively correlated with the total score. If it is deleted, the Alpha will be improved to .41 (raw) or .42 (standardized). Question 5 has the strongest relationship with the entire test. If this item is removed, the Alpha will be dropped to -.01 (raw) or .04 (standardized). This approach helps you to spot the bad apple and retain the good one.
Once again, variance plays a vital role in Cronbach Alpha calculation. Without variance there will be no result. The following quetions are from another post-test. Everybody scored Question 3 and 4 (1.00) but missed Question 4 (0.00). Because there is no variance, standardized Cronbach Alpha, which is based on covariance matrix, cannot be computed at all.
ReferencesAfifi, A. A., & Elashoff, R. M. (1966). Missing observations in multivariate statistics. Part I. review of the literature. Journal of the American Statistical Association, 61, 595-604.
Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill.
Press this icon to contact Dr. Yu via various channels