Experiment and Non-experiment
|
Chong-ho Yu, Ph.D., CNE, MCSE, CCNA
|
Experimental research and non-experimental research
"Experiment" is a widely misused term. When some people talk about their "experiment," indeed their study is non-experimental in nature. The following are the characteristics of experimental and non-experimental research designs.
Experiment
- Random sampling: a sampling method in which each member of a set has independent chances to be selected.
- Randomization: randomly assign subjects into the control group and the treatment group.
- Experimenter manipulation: directly manipulate variables to test cause-and-effect relationships e.g. alter the amount of drug given to the patients.
- Experimenter control: involves control of all other extraneous variables or conditions that might have an impact on the dependent variables.
It is very common for even experienced researchers to be confused by random sampling and randomization. For example, Morse (2007) wrote,
What is wrong with randomization? Processes of saturation are essential in qualitative inquiry: saturation ensures replication and validation of data; and it ensures that our data are valid and reliable. If we
select a sample randomly, the factors that we are interested in for our study would be normally distributed in our data, and be represented by some sort of a curve, normal or skewed. Regardless of the type of curve, we would have lots of data about common events, and inadequate data about less common events. Given that a qualitative data set requires a more rectangular distribution to achieve saturation, with
randomization we would have too much data around the mean (and be swamped with the excess), and not enough data to saturate
on categories in the tails of the distribution (p.234)
(Emphasis added by the author).
Again, randomization is concerned with assignment of group membership after the sample is drawn, whereas random sampling is a subject selection process.
Control and manipulation are very crucial to experimentation. Without them, the conclusion drawn from an observed phenomenon could be completely wrong even if it makes sense. Let's look at an everyday example: One of my friends has two TV sets. One of them is Japanese-made while the other is European-made. She insisted that the Japanese TV has a better quality than the European one because the former presents a sharper picture. Being skeptical to her claim, I conducted a small experiment: I simply swapped the locations of the two TV sets. As a result, the European TV set showed a clearer picture than the Japanese one. As you see, the factor here is the signal rather than the electronics.
In this case, the location as a source of "noise" is under my control.
Let's use herbs as another example: A Chinese friend maintained that some Chinese herbs could heal certain diseases. She even conducted an experiment to prove it. When her husband suffered a long-term illness, he took Chinese herbs for one week and his health condition improved substantively. The next week he stopped taking Chinese herbs and the condition reversed. I asked her how many types of Chinese herbs her husband took, she answered, "Ten." If I feed a patient with 10 vitamins, I am sure he will get better, too! Because of the lack of
manipulation/partition of the chemical components of the herb, this "experiment" did not tell us which Chinese herb is helpful to which body function.
However, it is important to note that "control" is not the core essence of experimentation. The difference between
controlled experiment and randomized experiment will be discussed in a later section.
Quasi-experiment
A quasi-experiment is a research design that does not meet all the requirements necessary for controlling the influence of extraneous variables. Usually what is missing is random assignment.
For example, when a researcher studies gender difference in computer use, obviously he cannot randomly assign gender (I am happy as a man. I don't want to be re-assigned).
Survey research
This type of research is very common in political sciences and communications, in which many variables are not controllable. For example, if you intend to study how wars affect people's perception to the quality of policy making, you cannot create a war or manipulate other world affairs, unless you are the villain in the movie "Tomorrow never dies." Because of this limitation, researchers send surveys to participants who are exposed to the real conditions.
Archival research
This approach is popular in economics and educational research, especially when the research project involves trends or longitudinal data. For example, if the researcher wants to find out the correlation between productivity and school performance, he can contact the General Accounting Office and the Department of Education for obtaining the related data in the last twenty years.
Comments
Both natural settings and laboratory-controlled experiments have pros and cons. On some
occasions, things happen in the real life challenge artificial experiments. For example, in some lab-controlled benchmark tests, Windows NT outperforms Novell Netware, Linux, and even UNIX! But computer users tell different stories in real settings.
It is common that experimentation is equated with scientific methodology, and thus is highly regarded. Actually, certain science subjects do not heavily reply on experimentation. For example, Astronomy is no less highly determined and well-developed a science than physics. Data collected by and theories derived from both disciplines became the foundation of Newtonian model. But in astronomy the major source of knowledge is from observation rather than
experimentation (Deese, 1972). For example, you cannot blow up Mars and see how the absence of Mars affects the
gravitational force of the Solar system! (With modern rocket and nuclear technologies, humans may be able to do so, but we shouldn't) Mathematics is another example. Although today with the aid of high-power computer, several mathematicians are able to conduct "mathematical experiments" by simulation (Chaitin, 1998), basically the origin of mathematical theorems are from logical deduction. Lack of experimentation can also be found in certain areas of biology such as evolution. Barkow (1989) pointed out that an evolutionary scenario is speculative in which the usual requirements for empirical verifiability are relaxed in favor of an emphasis on logic and plausibility.
Randomization and Simpson's Paradox
Randomization is the major difference between experiment and quasi-experiment. It is important to point out some common misconceptions regarding randomization.
Random sampling and randomization
As mentioned before, many people confuse random sampling and randomization. The former is a sampling process while the latter is concerned with assignment of group membership. Further, The purpose of random sampling is to enhance the generalizability of the results while the purpose of randomization is to establish the cause-effect interpretations of the results. In other words, random sampling counteracts the threat to external validity whereas randomization addresses the threat of internal validity. However, the above concepts are easily confused (May & Hunter, 1988). The topic of internal validity and external validity will be discussed in a later section.
In practice, randomization plays a more important role than random sampling in research. Let's face it. How often can a researcher draw a random sample? If the target population is all university students, are you able to draw samples from campuses in states other than your own? As a matter of fact, most research studies recruit convenience subjects who are instantly available (Frick, 1998). If the requirement of random sampling is strictly followed, experiments are hardly implemented. In fact, Reichardt and Gollob (1999) found that in a
randomized experiment, the use of a t test with a convenience sample can be justified without reference to a hypothetical infinite population, in which random samples are drawn.
To rectify the situation of non-random sampling, randomization is used to spread errors randomly among treatment groups (Fisher, 1971). Pitman (1937a, 1937b, 1938) went so far as to assert that random sampling is unnecessary for a valid test of the difference between treatments in a randomized experiment. Using an example of 40 convenience subjects, Babbie (1992) conceptualized randomization as treating convenience samples as probability samples: "It is as though 40 subjects in this instance are a population from which we select two probability samples-each consisting the characteristics of the total population, so the two samples will mirror each other." (p.243)
Simpson's Paradox
Be careful! Randomization is not the silver bullet. It is subject to the threat of Simpson's Paradox, which was discovered by Dr. E. H. Simpson (1951), not O. J. Simpson or Bart Simpson. Simpson's Paradox is a phenomenon that the conclusion drawn from the aggregate data is opposite to the conclusion drawn from the contingency table based upon the same data.
If it is too abstract to you, let's look at an example: In England once a 20-year follow-up study was conducted to examine the survival rate and death rate of smokers and non-smokers. The result implied a significant positive effect of smoking because only 24% of smokers died compared to 31% of non-smokers. Phillip and Morris should celebrate, right? Not yet. When the data were broken down by age group in a contingency table, it was found that there were more older people in the non-smoker group (Appleton & French, 1996).
Another example of Simpson's Paradox can be found in a study regarding student retention conducted at Arizona State University. Although the initial analysis based on all data (Yu, DiGangi, Jannasch-Pennell, Lo, Kaprolet, & Kim, 2007) shows that among the students who stay at the university, the probability of being a resident (p=.67) is higher than that of non-residents (p=.33), a seemingly opposite conclusion emerges when observations are grouped by state in a GIS analysis, as shown in the Figure 1:

Figure 1. Retention rate mapped to student home states
How is Simpson's Paradox related to randomization? Obviously, the above study used non-experimental data. You cannot ask people to become smokers or non-smokers. Neither can age be assigned (I wish it can be. If so, I will request to be assigned to the young age group). As a result, two groups which were non-equivalent in age led to Simpson's Paradox. Although randomization is said to prevent this from happening, randomization is not 100% fool-proof. By simulation, Hsu (1989) found that when the sample size is small, randomization tends to make groups become non-equivalent and increase the possibility of Simpson's Paradox. Thus, after randomization with a small sample size, researchers should check the group characteristics on different dimensions (e.g. race, sex, age, academic year, ...etc.) rather than blindly trusting randomization.
Randomized and controlled experiments
Another area of confusion can be commonly found in the difference between
randomized and controlled experiments. Today "randomized experiment" and
"controlled experiment" are often used synonymously. One of the reasons is
that usually an experiment consist of a controlled group and treatment
group, and group membership is randomly assigned into one of the
groups. Since "control" and "randomization" are both perceived as
characteristics of an experiment, it is not surprising that in many texts randomized experiment and
controlled experiment are either used in an interchangeable fashion or the
two terms are combined as one term such as "randomized controlled experiment."
The latter usage is legitimate as long as both control and randomization are implemented in the experiment. However, treating a randomized experiment as "a controlled experiment" and vice versa is misleading (e.g. "In controlled experiments, this is accomplished in part through the random assignment of participants to treatment and control
groups" (Schneider et a;., 2008)). Indeed, there is a subtle difference between the two.
Randomized experiment
R. A. Fisher is the pioneer of randomized experiment. In Fisher's view, even
if there is a significant difference between the control and the treatment
group, we may not be able to attribute the difference to the treatment when
there exists many uncontrollable variables and sampling fluctuations. The
objective of randomization is to differentiate between associations due to
causal effects of the treatment and associations due to some variable that is a
common cause to both the treatment and response variables. If there are
influences resulted from uncontrolled variables, by randomization the influences
would be randomly distributed across the control and treatment groups even
though no control of those variables are made.
Controlled experiment
On the other hand, the logic of experimentation up to Fisher's time was that
of controlled experiment. In a control experiment, many variables are
experimentally fixed to a constant value. However, Fisher explicitly stated
that it is an inferior method, because it is impossible to know what variables
should be taken into account. For example, a careful researcher
may assign equal numbers of males and females into each group, but she/he may
omit the age and educational level of the subjects. In Fisher's view, instead of
attempting to put everything under control, the researcher should let
randomization take care of the uncontrollable factors. It is not to suggest that
Fisher did not advocate controlling for other causes in addition to
randomization. Rather he explicitly recommended that the researcher should do as much
as control as he can, but he advised that randomization must be employed as "the
second line of defense" (Shipley, 2000).
Comments
Following the same line of reasoning, the Canadian Task Force for Preventive
Health Care (2003) prefers randomized experiments to controlled trials without
randomization as clinical evidence, as shown in the following table.
|
Rating |
Research design |
|
I
|
Evidence from randomized controlled trial(s) |
|
II-1
|
Evidence from controlled trial(s) without randomization |
|
II-2
|
Evidence from cohort or case-control analytic studies, preferably from more
than one centre or research group |
|
II-3
|
Evidence from comparisons between times or places with or without the
intervention; dramatic results in uncontrolled experiments could be included
here |
|
III
|
Opinions of respected authorities, based on clinical experience; descriptive
studies or reports of expert committees |
Nonetheless, a randomized experiment is not necessarily superior to a
controlled experiment. As mentioned before, when the sample size is small,
randomization tends to make groups become non-isomorphic and thus may lead to a
Simpson's Paradox (Hsu, 1989). Not surprisingly, when the sample size is small,
a controlled experiment is more advisable.
In addition, Berwick (2008) challenged the view that randomized experiments can be applied to all situations. Many years ago Rapid Response Team (RRT), an innovative preventative health care approach introduced by Australian doctors, in which a team of physicians and nurses monitor vital signals of patients and take proactive actions, was implemented in the United States. But, randomized experiments conducted by American researchers showed that there were no significant differences between RTT and non-RTT approaches in terms of reducing the number of unexpected deaths. Berwick questioned the validity of the conclusion, for it ignored the cultural context and the specific delivery mechanisms.
Similarly, Rawlins disputed the experimental "gold standard" in medical research by listing the limitations of randomized and controlled experiments. First, like social scientists, sometime medical researchers face a "mission impossible" scenario when the disease under investigation is extremely rare and thus the number of patients is very small. Second, on some occasions experimentation is unnecessary, especially when a treatment produces a "dramatic" benefit, such as Imatinib (Glivec) for chronic myeloid leukemia. In health science research there is a stopping rule. When the treatment shows healing effects, the trial should be stopped early so that the control group can switch to the more effective treatment. There is no consensus among statisticians as to how best to handle this situation, but treating this type of incomplete experiment as invalid would throw out valuable information (cited in Medical News Today, 2008).
Casual Inferences
Ruling out rival interpretations in Quasi-experiment
Some statisticians assert that one can never draw causal inferences without experimental manipulation (e.g. SAS Institute, 1999). Some researchers argued that causal inferences are weakened in quasi-experiments (e.g. Keppel & Zedeck, 1989). However, Christensen (1988) held a more liberal position:
Many causal inferences are made without using the experimental framework; they are made by rendering other rival interpretations implausible. If a friend of yours unknowingly stepped in front of an oncoming car and was pronounced dead after being hit by the car, you would probably attribute her death to the moving vehicle. Your friend might have died as a result of numerous other causes (a heart attack, for example), but such alternative explanations are not accepted because they are not plausible. In like manner, the causal interpretations arrived at from quasi-experimentation analysis are those that are consistent with the data in situations where rival interpretations have been shown to be implausible. (p.306)
Lurking variables and theoretical casual variables in correlational studies
Archival research is also called correlational research because cause-and-effect inferences cannot be directly made. For example, even though the last twenty-year data shows a positive correlation between productivity and school performance, it would be a leap of faith to conclude that school performance gain is the cause of productivity gain or vice versa. Usually another variable, which may be the true cause, is "lurking" behind background. This variable is called lurking variable, and is easily undetected by a correlational study.
There are many jokes about careless use of correlational studies. For example, once a study indicated that consumption of alcohol improves academic performance (the explanation may be something else: when the overall economy improves, both alcohol consumption and academic performance go up). A study in Taiwan during the 70s indicates that the more woks a household owned, the fewer children the family had. Thus, the government gave woks to households in an attempt to lower national birth rate. The moral of these stories: researchers should select theoretical casual variables even though the study is correlational.
Nevertheless, Luker, Luker, Jr., Cobb, and Brown (1998) defended the use of causal inference in correlation/regression frameworks:
In the social and behavioral sciences, experimental randomization and control are usually not possible. This has led to an awkward condition in which our work does not permit useful policy recommendations. The well-intentioned assertion that relationships do not mean causation, while useful in contesting gross simple-mindedness, is paralyzing and misleading in the social sciences. Or, as Dewey puts it, the critical characteristic of all scientific operations is revealing relationships. Relationships are a necessary condition of causation. We know that X cannot be a cause of Y unless X and Y are related. The causal analysis of nonexperimental data, therefore, can only go on through the analysis of relationships. Causal inference from non-experimental data, then, requires the testing of theoretical causal variables in a variety of quasi-experimental or multiple regression frameworks...Statistical failures of models suggest that we are not on the right track. Confirmation of the models suggests the possibility of ameliorative solutions.
Explicit questions and selection bias in survey research
Whether causal inferences can be drawn from survey research is debatable. It is true that survey research does not implement any variable manipulation. However, when a questionnaire includes explicit questions concerning rationale and motivation, such as "Why do you choose Web-based instruction over conventional instruction?" it is difficult to explain that the answers provided by respondents do not indicate any cause and effect.
Generalizability always comes hand in hand with causal inferences. Survey research is not weaker than experiment in this regard. In many situations, survey research tends to obtain a more random sample than experimental research does. Usually subjects are required to be physically present in experiment studies, and thus only convenience samples are recruited from the local campus or the local town. On the other hand, survey research can break through this limitation by sending questionnaires to prospective subjects across the country. In the age of Internet, the researcher can even set up an online form to reach potential respondents all over the world.
However, someone may argue that a "cyber-sample" is a self-selected sample rather than a random sample. In this case a systematic bias may affect who responds to the questionnaire and who doesn't. The prediction of "Dewey defeats Truman" by Chicago Daily Tribune in 1948 presidential election is a classic example of selection bias. The interviewees were polled by phone and thus the sample was confined to households who own a telephone. By the same token, when the survey is posted on the Web, it is likely that respondents are computer literate and have access to computer equipment. Indeed, the same problem can be found in experimental research. Subjects could refuse to participate in the experiment or withdraw from the study even though they start the process. In both survey research and experimental research, the question is not whether there are missing data. Rather, the question should be: "Are data missing at random?"
Nonetheless, if the subject matter to be studied is Web-based instruction, this should not be considered a selection bias. In an online survey concerning Web-based instruction, the researcher should expect that all respondents possess basic computer operation skills and have access to the Internet (Once I assisted a researcher to post an online survey on my database server. But several respondents, who used 2400 baud modems, complained that it took five to ten minutes to load a page).
Research design and statistical analysis
Traditionally, analysis of variance (ANOVA) is said to be appropriate for data collected in an experiment whereas regression analysis is considered a proper method for data collected in non-experimental designs. Keppel and Zedeck (1989) argued that both ANOVA and regression are suitable to experimental designs while only regression is fitful to most non-experimental designs. In other words, regression is applicable to both experimental and non-experimental deigns when the independent variables are continuous and/or categorical. For this reason, Pedhazur and Schmelkin (1991) asserted that regression is superior to ANOVA. However, Pedhazur and Schmelkin criticized that in non-experimental designs some researchers convert continuous variables into categorical variables in order to fit the data into an ANOVA framework as if it were experimental. This conversion not only leads to loss of information, but also changes the nature of the variables and the design.
Further Reading
For beginners
Kerlinger (1986) and Cook and Campbell (1979) are two good books to get started with experimental design for neither book requires a strong mathematical or statistical background. Their books concentrate on the design aspect rather than the analysis aspect.
Montgomery (1997) is a very updated and comprehensive book though it is written for engineering majors. Readers should be able to follow the content after taking one or two introductory statistics courses. You may skip the chapter on response surface because it may not be applicable to educational and psychological research. Dr. Montgomery is a professor of Industrial Engineering at Arizona State University.
For intermediate users
Kennedy and Bush (1985)'s book was written for graduate students in education and psychology who have a modest background in both mathematics and statistics and who are interested in a subject-matter field rather than statistical methodology. One nice thing about the book is that it explains the mathematical notation symbols, which are confusing to many readers.
For beginner and intermediate users
Levine & Parkinson (1994) is a book for both beginners and research professionals. The first half of the book covers experimental methods for psychologists in general whereas the second half covers very detailed examples of experimental methods in cognitive psychology, social psychology, and clinical psychology. Levine and Parkinson are professors of psychology at Arizona State University.
For advanced users
Maxwell & Delaney (1990) and Winer, Brown , and Michels (1991) are considered classics in the field of experimental design. Their books cover both the design and the analysis aspects. However, their books require a very strong statistical background.
Last revised: 2009 April
Reference
- Appleton, D. R. & French, J. M. (1996). Ignoring a covariate: An example of Simpson's paradox. American Statistician, 50, 340-341.
- Babbie, E. (1992). The practice of social research (6th ed.). Belmont, CA: Wadsworth.
- Barkow, J. H. (1989). Darwin, sex, and status: Biological approaches to mind and culture. Toronto: University of Toronto Press.
- Berwick, D. (2008, August). Inference and improvement in health care. Paper
presented at the 2008 Joint Statistical Meeting, Denver, CO.
- Canadian Task Force on Preventive Health Care. (2003). Canadian Task Force on Preventive Health Care
levels of evidence used to rate research design and quality of
individual studies. Retrieved August 13, 2008, from
http://www.ctfphc.org/
- Chaitin, G. J. (1998). The limits of mathematics: A course on information theory and the limits of formal reasoning. Singapore: Springer-Verlag.
- Christensen, L. B. (1988). Experimental methodology. Boston : Allyn and Bacon.
- Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145-153.
- Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Boston, MA: Houghton Mifflin Company.
- Deese, J. (1972). Psychology as science and art. New York, NY: Harcourt Brace Jovanovich, Inc.
- Fisher, R. A. (1971). The design of experiments (9th ed.). New York, Hafner Publishing Company.
- Frick, R. W. (1998). Interpreting statistical testing: Process and propensity, not population and random sampling. Behavior Research Methods, Instruments, & Computers, 30, 527-535.
- Hsu, L. M. (1989). Random sampling, randomization, and equivalence of contrasted groups in psychotherapy outcome research. Journal of Consulting and Clinical Psychology, 57, 131-137.
- Keppel, G., & Zedeck, S. (1989). Data analysis for research designs: Analysis of variance and multiple research/correlation approaches New York: W. H. Freeman,
- Kennedy, J. J. & Bush, A. J. (1984). An introduction to the design and analysis of experiments. Lanham, MD: University Press of America, Inc.
- Keppel, G. & Zedeck, S. (1989). Data analysis for research design: Analysis of variance and multiple regression/correlation approaches. New York: W. H. Freeman.
- Kerlinger, F. N. (1986). Foundations of behavioral research. New York: Holt, Rinehart and Winston.
- Levine, G., & Parkinson, S. (1994). Experimental methods in psychology. Hillsdale, N.J.: L. Erlbaum.
- Luker, B., Luker, B. Jr., Cobb, S. L., & Brown, R. (1998). Postmodernism, institutionalism, and statistics: Considerations for an institutionalist statistical method. Journal of Economic Issues, 32, 449-457.
- Maxwell, S. E., & Delaney, H. D. (1990). Design experiments and analyzing data: A model comparison perspective. Belmont, CA: Wadsworth Publishing company.
- May R. B., & Hunter, M. A. (1988). Interpreting students' interpretations of research, Teaching of Psychology, 15, 156-158.
- Medical News Today. (2008). Attack traditional ways of assessing the evidence of therapeutic interventions. Retrieved February 24, 2009, from
http://www.medicalnewstoday.com/articles/126043.php
- Montgomery, D. C. (1997). Design and analysis of experiments. New York : Wiley.
- Morse, J. (2007). Sampling in grounded theory. In A. Bryant, & K. Charmaz (Ed.),
Sage handbook of grounded theory (pp. 229-244). Los Angeles, CA: Sage.
- Pedhazur, E. J. & Schmelkin, L. P. (1991). Measurement, design, and analysis : An integrated approach. Hillsdale, N.J. : Lawrence Erlbaum Associates.
- Pitman,E. J. G. (1937a). Significance tests which may be applied to samples from any populations. Journal of Royal Statistical Society B, 4, 119-130
- Pitman,E. J. G. (1937b). Significance tests which may be applied to samples from any populations II: The correlation coefficient. Journal of Royal Statistical Society B, 4, 225-232
- Pitman,E. J. G. (1938). Significance tests which may be applied to samples from any populations III: The analysis of variance test Journal of Royal Statistical Society B, 29, 322-335
- Reichardt, C. S., & Gollob, H. F. (1999). Justifying the use and increasing the power of a t test for a randomized experiment with a convenience sample. Psychological Methods, 4, 117-128.
- SAS Institute. (1999). Comments on interpreting regression statistics. [On-line] Available URL:
http://www.sas.com
- Shipley, B. (2000). Cause and correlation in biology: A user's guide to path analysis, structural equations and causal inferences. Cambridge: Cambridge
University Press.
- Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W. H., & R. J. Shavelson. (2008).
Estimating causal effects using experimental and observational designs: A think tank white paper. Washington D. C.:
American Educational Research Association.
- Simpson, E. H. (1951). The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society, Ser. B., 13, 238-241.
- Winer, B. J., Brown, D. R., & Michels, K. M. (1991). Statistical principles in experimental design. New York: McGraw-Hill, Inc.
- Yu, C. H., DiGangi, S., Jannasch-Pennell, A., Lo, W. J., Kaprolet, C., & Kim, C. (2007, February).
A data mining approach to retention. Paper presented at the 2007 EDUCAUSE Southwest Regional Conference, Austin, TX.
Go up to the main menu
|
|