VIF and multicollinearity

Help with analyzing a design and interpreting the results.

Moderators: Wayne, ShariK

VIF and multicollinearity

Postby ajkugel » Tue Oct 21, 2008 10:29 pm

Could someone explain to me the significance and/or give me the idiot's guide to what VIF and multicollinearity are, where they come from, and how they are important when analyzing a design?

Thanks

Alex
Alex Kugel
Combinatorial Materials Research Lab
North Dakota State University
Fargo, ND
ajkugel
Registered Member
 
Posts: 11
Joined: Tue Oct 07, 2008 8:14 pm
Location: Fargo ND

Postby Wayne » Thu Oct 23, 2008 4:09 pm

Multicollinearity is when the factors are more correlated with each other than then are with the responses being measured. When this happens the models usually predict the responses okay, but the cause and effect relationship assumed to to exist between the factors and the reponses is somewhat suspect. When there is perfect correlation between factors then aliasing is created.

Multicollinearity extends to linear combinations of factors as well. In other words, one factor can be regressed on (predicted by) some combination of the other factors.

In Design-Expert you can right-click on a number in the report and select "help" for an explanation and sometimes even the formula for its calculation. VIF is a prime candidate for this.
Wayne
Stat-Ease Consultant
 
Posts: 258
Joined: Tue May 27, 2008 8:31 am

VIF and multicollinearity

Postby Dick » Fri Apr 03, 2009 11:30 am

The linear regression algorithm used in RSM analysis minimizes residuals at the inference space design sites. A common assumption is that residuals at these sites describe the unexplained variance that would be associated with general model preduictions elsewhere in the design space. That tends to be less true when substantial multicollinearity is in play. Under those circumstances, the prediction errors can be substantially larger at off-design points, where the model might be applied for general response prediction. For this reason, we now include as a matter of course a number of confirmation points distributed randomly within the test matrix. These points are comprised of randomly selected factor levels within the ranges tested, and are not fitted to produce the response model, but instead are used to test it. We often find significant differences in residual variance at the confirmation points relative to the design points, especially when the RSM data are overfitted, or when variance inflation factors are high. We use this in conjunction with subject-matter expertise to try to reformulate our RSM models by attempting to eliminate selected high-order terms that display high multicollinearity (as evidenced by the VIF numbers). This generally leads to a more robust model -- one with a more uniform distribution of actual prediction errors as quantified directly via the confirmation points.

MEMO TO STATEASE: The existing point-prediction capability in DX7 is really great for individual points, but some of our people are going so far as to replicate the rest of the DX7 functionality in MATLAB in order to obtain relief from the onerous chore of examining multiple prediction errors one point at a time (OPAT? :) ) for large numbers of confirmation points. It is not uncommon to have scores of such points in a large-scale wind tunnel test. Would you PLEASE consider a batch-mode point-prediction capability in which multiple confirmation runs might be pasted into a spreadsheet along with their measured responses, to be compared in one fell swoop with model predictions for those points? A report that shows the same information as the current point-prediction, but for multiple points, would be just the ticket. A count of residual magnitudes less than the 95% prediction interval half-width would be a valuable extra. Thanks!
Dick
Registered Member
 
Posts: 5
Joined: Wed Oct 08, 2008 8:51 am
Location: Hampton, VA

Re: VIF and multicollinearity

Postby Tryg » Mon Apr 06, 2009 12:53 pm

Dick,

Thank you for your comments and suggestion on improving the utility of the point prediction. This topic has been linked to the Suggestions & Feedback forum for review and consideration.

Tryg
Tryg
Stat-Ease Developer
 
Posts: 115
Joined: Thu Feb 28, 2008 3:03 pm
Location: Minneapolis, MN


Return to Analysis

Who is online

Users browsing this forum: No registered users and 0 guests

cron