Intercept in linear regression model

Why do we need an intercept in a linear regression model?

If we use statsmodel.OLS do we need to add the intercept explicitly and how do we do it?

2 comments

  1. Intercept is the value of y, when all Xs are 0. So the y-intercept is the predicted value of y when all X1, X2, X3,….Xn are zero. In 2-dimensional space, y-intercept is where the  regression line cuts the y-axis (value of x=0 at this point).

    Y-intercept is interpreted as the value of the target y-variable when all the predictors are 0.

    For example, if we try to fit a regression line to predict the marks obtained in the test based on no. of hours of studies:

    y = 20+.6x, where y is the marks and x is the number of hours of study.

    20 is the y-intercept and it means a student will obtain 20 marks, even if he does not study.

    However, when this value zero is outside the range of the values of the predictor, used to build the model,  the y-intercept will not make much sense in the context of the problem.

    Statsmodel , by default fits a line passing through the origin, i.e., there is no y-intercept included by default.

    To include y-intercept we use the function add_constant().

  2. If we remove the intercept then that would make the regression pass through the origin and the dependent and independent variables are equal to zero. In a regression model, it is always good practice to add intercept in your equation unless you are specifically asked to make the regression line pass through the origin.

    In statsmodel.OLS, the intercept has to be added by the user and is not added by default. You can add it using:

    X = sm.add_constant(X)

    OR

    sm.OLS(y, statsmodels.tools.add_constant(X))
     
    

Leave a comment