Multi-collineartity, Variance Inflation
and Orthogonalization in Regression


Chong Ho (Alex) Yu, Ph.D., D. Phil. (2022)

Mathematical dependence and logical dependence

A regression model with too many predictors may be problematic. But even if a model is as simple as applying four independent variables, collinearity may still happen when a composite score is included in the model. The following is a typical example:

GPA = GRE-verbal + GRE-quantitative + GRE-analytical + GRE-total
In the above example, GRE-total is only the sum of all other predictors. Needless to say, GRE-total is strongly associated with those variables. Technically speaking, they are both mathematically and logically dependent. In terms of mathematics, the number of GRE-total is based upon the numbers of all others. In the logical sense, GRE-total is not a new concept.

However, the following model is legitimate though strong relationships exist among predictors:

GPA = time spent with family + time spent in church + (time spent with family * time spent in church)
The researcher created the last variable because he suspected that GPA is a function of the interplay between family values and Christian ethics. Nevertheless, in this case they are mathematically dependent but logically independent. Mathematically speaking, the interaction effect is the product of the first two variables and they certainly have strong numeric relationships. Conceptually speaking, the interaction is considered a new variable and thus it is logically independent from others. But when a regression model is built, will the close relationships lead to collinearity and affect the model's stability?

For Althauser (1971), the answer is "yea" and thus he questioned the appropriateness of the use of interaction variables in a regression model. Actually, when a regression model involving an interaction effect, the regression plane is no longer flat. Rather it is curvilinear as shown in the following left panel. Let's use the finger-and-paper analogy again. In the right picture the paper is curved, and my fingers (data points) are also curved around the paper. Even though my fingers are close to each other, the plane is still well-supported.

Why is the interaction variable expressed in the form of a product term?

Once a student asked me, "Why do you multiply two variables to create an interaction variable?" Good question. When a variable X is said to interact with another variable Z, it may be that the relationship between a dependent variable Y and the independent variable X is conditioned by a moderating variable Z. The following equations expressed their relationships:

Y = a + bX + e [equation 1]
a = c1 + c2Z [equation 2]
b = d1 + d2Z [equation 3]

When we substitute [2] and [3] into [1], we have:

Y = (c1 + c2Z) + (d1 + d2Z )X + e [equation 4]
Y = c1 + c2Z + d1X + d2ZX + e [equation 5]

That's why an interaction variable is a product term. For the detail please consult Fisher (1988)


Back


Menu


Next


Navigation

Index

Simplified Navigation

Table of Contents

Search Engine

Contact