Power analysis
|
Chong-ho Yu, Ph.D., CNE, MCSE, CCNA
|
Factors to power
Power is determined by the following:
- Alpha level
- Effect size
- Sample size
- Variance
- Direction (one or two tailed)
|
 |
Generally speaking, when the alpha level, the effect size, or the sample size increases, the power level increases. Please view this QuickTime animated demo to learn the above relationships If you want to examine the relationships frame by frame, please look at this Shockwave slide show (Caution: Both modules are large in file size, please view them through a T1 network rather than a modem connection).
However, there is an inverse relationship between variance and power. The
variance resulting from measurement error becomes noise and thus it could decrease
the power level. To be more specific, since a high degree of measurement error hinders the condition of the
variable from being correctly indicated, it drags down the possibility
of correctly detecting the effect under study (Environment Protection Agency,
2007).
The role of direction in power analysis is very straight-foreword. Given that
all other conditions remain the same (alpha, sample size...etc), moving the test
from one-sided to two-sided would decrease the power level, as shown in the
following example.
Balancing Type I and Type II errors
Researchers always face the risk of failing to detect a true significant effect. The probability of this risk is called Type II error, also known beta. In relation to Type II error, power is define as 1 - beta. In other words, power is the probability of detecting a true significant difference. To enhance the chances of unveiling a true effect, a researcher should plan a high-power and large-sample-size test. However, absolute power, corrupt (your research) absolutely i.e. when the test is too powerful, even a trivial difference will be mistakenly reported as a significant one. In other words, you can prove virtually anything (e.g. Chinese food can cause cancer) with a very large sample size. This type of error is called Type I error.
Power analysis is a procedure to balance between Type I and Type II error. Simon (1999) suggested to follow an informal rule that alpha is set to .05 and beta to .2. In other words, power is expected to be .8. This rule implies that a Type I error is four times as costly as a Type II error. Granaas (1999) supported this rule because small-sample-size, low-power studies are far more common than over-powered studies. Granaas pointed out that the power level for published studies in psychology is around .4. A researcher would get the right answer more often by flipping a coin than by collecting data (Schmidt & Hunter, 1997). Granaas' observation is confirmed by earlier and recent studies (e.g. Cohen, 1962, Clark-Carter, 1997). Thus, pursuing higher power at the expense of inflating Type I error seems to be a reasonable course of action. There are challengers to this ".05 and .2 rule." For example, Muller and Lavange (1992) argued that for a simple study a Type I error rate of .05 is acceptable. However, pushing alpha to a more conservative level should be considered when many variables are included.
Someone argued that for a new experiment a .05 level of alpha is acceptable. But to replicate a study, the alpha should be as low as .01. This is a valid argument because the follow-up test should be tougher than the previous one. For instance, after I passed a pencil and paper test regarding driving, I should take the road test instead of another easy test.
In most cases I follow the ".05 and .2 rule." In other words, I consider the consequence of Type I error as less detrimental. If I miss a true effect (Type II error), I may lose my job or an opportunity to become famous. If I commit a Type I error such as claiming the merits of Web-based instruction while it is untrue, I just waste my sponsor's money in developing Web-based courses.
Type III error
Kimbell (1957) coined the term "Type III error" to describe the error which consists of giving the right answer to the wrong question. Muller and Lavange (1992) related Type III error to misalignment of power and data analysis.It is not unusual for researchers to be confused by various types of research design such as ANOVA, ANOVCA, and MANOVA, and consequently conduct a power analysis for one type of research design while indeed another type is used. Once an auto shop installed Toyota parts into my Nissan vehicle. Researchers should do better than that.
Power of replications
The objective of research is not only to ask whether the result in a particular study is significant, but also asking how consistent research results are by replication. Ottenbacher (1996) pointed out that the relationship between power and Type II error is widely discussed, but the fact that low statistical power can reduce the probability of successful research replication is overlooked.
Power and Precision/Sample Power
Today many power analysis software packages are available in the market. I recommend Power and Precision (Borestein, Cohen, & Rothstein, 1997) for its user-friendliness and its coverage of a wide variety of scenarios. This product is also marketed by SPSS Inc. under the name Sample Power.
To use this software application, enter the value of alpha level, effect size, and sample size into the proper fields. Let's use a 2X2 factorial design as an example. Given that alpha is set to .05 and power is expected to be about .83 (this is a bit over-powered, just for the sake of illustration), the recommended sample size should be 140 given that the effect size is .25. Please go through the slide show below to see how the desired sample size is obtained.
PASS
One major shortcoming of Power and Precision is the absence of power calculation for repeated measures. Fortunately, you can find this feature in PASS (NCSS
Statistical Software, 2008). The following screenshot is an output of power
analysis for repeated measures conducted in PASS. The interface is very
user-friendly. Instead of presenting jargon, the output includes references for
you to cite in your paper, the definitions of terminology, and also summary
statements.

G Power
G Power (2000) is a free program for power analysis. It has both DOS and Mac versions. Needless to say, the interface of the Mac version is much nicer. In some aspects G Power is easier to use than the previous two packages. For example, in Sample Power the user must go to another tab to enter the effect size. In PASS some power calculations such as power for regression does not let users enter the effect size. Instead, the effect size is calculated based upon the R2. G Power has a different setup--all input fields are in the same window and the effect size field is very obvious.
Like other Power analysis programs, G Power can also draw a Power graph so that you can see the relationship between the sample size and the power level.
EpiInfo and Stata
Different disciplines have different needs for power analysis. For example,
in epidemiology, public health, and biostatistics, it is very common to employ
case-control design or cross-sectional design. In addition, the outcome
variables are often dichotomous (1, 0) and thus it necessitates power analysis
for testing proportions. Center for Disease Control and Prevention (2007) has a package
specific to thus purpose. It is a freeware and understandably, the interface is
not GUI.

Stata (StatCorp, 2007), which is a commercial software package, offers a
better user interface as shown below. But users also have the option of entering
commands for faster output.

SAS
Neither Power and Precision nor NCSS computes power for MANOVA. Friendly (1991) wrote a SAS Macro entitled mpower to fill this gap. This macro can perform a retrospective (post hoc) power analysis only. To use mpower, one should specify the parameters in the macro such as the names of the dataset and the dependent variables:
%macro mpower(
yvar=d1 d2, /* list of dependent varriables */
data=stats, /* outstat= data set from GLM */
out=stats2, /* name of output data set with results */
alpha=.05, /* error rate for each test */
tests=WILKS PILLAI LAWLEY ROY /* tests to compute power for */
);
|
Next, run a GLM as usual except that the GLM should output the statistics for mpower (In this example, the outstat is "stats"). At last, call the macro using the syntax %mpower( );
proc glm data=one outstat=stats;
class f1;
model d1 d2 = f1 /nouni;
contrast 'effect' f1 1 -1;
manova h=f1;
%mpower();
|
The following is part of the output:
SAS also has a power analysis module for general purposes. The
following is a screenshot of SAS Power and Sample Size:

Practical power analysis
Running Power and Precision is easy, but obtaining adequate subjects is
difficult. One hundred and twenty eight subjects are still obtainable. What if
Power and Precision suggests four hundred subjects? Well, you may choose not to
mention power analysis in your paper or pay ten dollars to whoever is willing to
participate in your study. Neither one seems to be a good solution. Hedges
(2006) argued that when it is impossible to increase the sample size or to
employ other resource-intensive remedies, "selection of a significance level
other than .05 (such as .10 or even .20) may be reasonable choices to balance
considerations of power and protection against Type I Errors" (cited in
Schneider, Carnoy, Kilpatrick, Schmidt, & Shavelson, 2007, p.27).
Nonetheless, this approach might not be accepted by many thesis advisors,
journal reviewers, and editors (unless your advisor is Dr. Hedges). Let's look
at the practical side of power analysis. Muller and Lavange (1992) asserted that the following should be taken into account for power analysis:
- Money to spent
- Personnel time of statisticians and subject matter specialists
- Time to complete the study (opportunity cost)
Ethical costs in the research
The first three are concerned with appropriate use of resources. The last one involves risk taken by subjects during the study. For example, if a medical doctor tests the effectiveness of an experimental treatment, he/she may decide to limit the test to a small sample size.
To avoid wasting resources, one should find the optimal sample size by plotting power as a function of effect size and sample size. As you notice from the following figure, , power becomes "saturated" at certain point i.e. the slope of the power curve decreases as sample size increases. A large increase in sample size does not lead to a corresponding increase in power.
Post hoc power analysis
In some situations researchers have no choices in sample size. For example, the number of participants has been pre-determined by the project sponsor. In this case, power analysis should still be conducted to find out what the power level is given the pre-set sample size. If the power level is low, it may be an explanation to the non-significant result. What if the null hypothesis is rejected? Does it imply that power is adequate and no power analysis is needed? Granaas (1999) suggested that there is a widespread misconception that a significant result proves an adequate power level. Actually, even if power is .01, a researcher can still correctly reject the null one out of one hundred times. This is a typical example of this logical error: If P then Q, if Q then P.
When power is insufficient but increasing sample size is not an option, you can make up subjects. Don't worry. It is a legitimate and ethical procedure. A resampling technique named bootstrapping can be employed to create a larger virtual population by duplicating existing subjects. Analysis is conducted with simulated subjects drawn from the virtual population. Resampling procedures will be discussed in another section.
Last but not least, low power does not necessarily make your study a poor one if you found a significant difference. Yes, even if the null is rejected, the power may still be low. But this can be interpreted as a strength rather than as a weakness. Power is the probability of detecting a true difference. If I don't have adequate power, I may not find a significant result. But now I can detect a difference in spite of low power, what does it mean? Suppose it takes at least 20 gallons of gasoline for a vehicle with an efficient engine to go from Phoenix to LA. But now I can do it with 15 gallons, it seems that the engine is really efficient!
Reading
For an overview of power analysis, I recommend "An introduction to power analysis," a book chapter written by Welkowitz, Ewen, and Cohen (1982). To go beyond the basic, please consult Lipsey (1990).
If you are interested in learning power analysis through visualization and simulation, please download the program Power-sim (Mac
9 or before only, sorry, I cannot keep up with OSX and beyond), which was programmed by myself and Dr. John Behrens.
Instructions for downloading and running the program:
- Download the compressed program. The archive should be decompressed by your web browser.
- Click open Xlisp-stat 50 PPC in the folder R-code.
- Type (menu) at the > prompt (The bracket must be included).
- Choose Power Simulation from the popup menu.
- Choose the indicators (e.g. power, beta...etc) on the right panel.
- Manipulate effect size, sample size, and alpha level on the left.
Do not take power analysis lightly. This concept is misunderstood by many students and researchers. If you are interested in knowing what those misconceptions are, please read Identification of misconceptions concerning statistical power with dynamic graphics as a remedial tool, an article written by myself and Dr. John Behrens.
Reference
- Borestein, M., Cohen, J., Rothstein, H. (1997). Power and precision. Dataxiom, Inc., [On-line] Available URL: http://www.dataxiom.com
- Center for Disease Control and Prevention. (2007). Epi Info. [On-line] Available URL:
http://www.cdc.gov/epiinfo/
- Clark-Carter, D. (1997). The account taken of statistical power in research journal in the British Journal of Psychology. British Journal of Psychology, 88, 71-83.
- Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145-153.
- Environment Protection Agency. (2007). Statistical power analysis. Retrieved April 21, 2008 from
http://www.epa.gov/bioindicators/statprimer/power.html
- Friendly, M. (1991). SAS macro programs: mpower. [On-line] Available URL: http://www.math.yorku.ca/SCS/sasmac/mpower.html.
- G*Power (2000). [Online]. Available URL: http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/
- Granaas, M. (1999, January 7). Re: Type I and Type II error. Educational Statistics Discussion List (EDSTAT-L). [Online]. Available E-mail: edstat-l@jse.stat.ncsu.edu [1999, January 7].
- Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental design. Newbury Park: Sage Publications.
- Ottenbacher, K. J. (1996). The power of replications and replications of power. American statistician, 50, 271-275.
- Muller, K. E., & Lavange, L. M. (1992). Power calculations for general linear multivariate models including repeated measures applications. Journal of the American Statistical Association, 87, 1209-1216.
- NCSS Statistical Software (2008). PASS [Computer Software] Kaysville, UT:
The Author.
- Schmidt, F. L., & Hunter, J. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 37-64). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.
- Schneider, B., Carnoy, M., Kilpatrick, J. Schmidt, W. H., & Shavelson, R. J. (2007).
Estimating causal effects using experimental and observational designs: A think tank white paper. Washington, D.C.: American Educational Research Association.
- Simon, Steve. (1999, January 7). Re: Type I and Type II error. Educational Statistics Discussion List (EDSTAT-L). [Online]. Available E-mail: edstat-l@jse.stat.ncsu.edu [1999, January 7].
- Statacorp. (2007). Stata 10. [Computer software and manual]. College Station, TX: The Author.
- Welkowitz, J., Ewen, R. B., & Cohen, J. (1982). Introductory statistics for the behavioral sciences. San Diego, CA: Harcourt Brace Jovanovich, Publishers.
- Yu, C. H., & Behrens, J. T. (1995). Identification of misconceptions concerning statistical power with dynamic graphics as a remedial tool. Proceedings of 1994 American Statistical Association Convention. Alexandria, VA: ASA.
Last updated: June 2009
Go up to the main menu
|
|