For more accuracy while fitting a linear model, we drop the input variables/features which are not significant(i.e pvalue>0.05) and whose variation inflation factor is greater than 5,I have landed up in a case where both input variables are highly correlated with each other and significant too,so which one should I drop?

## 1 comment

## Leave a comment

You must be logged in to post a comment.

One of the assumptions of Linear Regression – No multicollinearity. You will need to remove multicollinearity in case you are building a Linear Regression model. After removing the variables that causing multicollinearity, then check for variable significance.

Also, if you do not want to remove the variables that causing multicollinearity but is turning out to be significant, then based on applicable assumptions, you can also try PrincipalComponent Analysis for dimensionality reduction. Note: PCA has a few assumptions – you can read more on this at https://statistics.laerd.com/spss-tutorials/principal-components-analysis-pca-using-spss-statistics.php