Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

As you practice the lessons learned, there will be more questions along the way. Join the Algoritmo Lab Community filled with like-minded people – from practicing data scientists, analysts, lecturers – and other data science learners like yourself, to constantly keep learning together.

## Dropping of variables which are not significant and are having high VIF value

## Dipayan Sarkar

One of the assumptions of Linear Regression - No multicollinearity. You will need to remove multicollinearity in case you are building a Linear Regression model. After removing the variables that causing multicollinearity, then check for variable significance. Also, if you do not want to remove theRead more

One of the assumptions of Linear Regression – No multicollinearity. You will need to remove multicollinearity in case you are building a Linear Regression model. After removing the variables that causing multicollinearity, then check for variable significance.

Also, if you do not want to remove the variables that causing multicollinearity but is turning out to be significant, then based on applicable assumptions, you can also try PrincipalComponent Analysis for dimensionality reduction. Note: PCA has a few assumptions – you can read more on this at https://statistics.laerd.com/spss-tutorials/principal-components-analysis-pca-using-spss-statistics.php

See less## Multiple Linear Regression

## Dipayan Sarkar

The statsmodels.regression.linear_model.OLSResults.pvalues should give you the pvalues of the respective variables. pvalues[0] should give you the pvalues of the 1st variable. You can filter out the names of the variables wherever pvalues[i]<0.05 and then use the list of variable names to filterRead more

The statsmodels.regression.linear_model.OLSResults.pvalues should give you the pvalues of the respective variables. pvalues[0] should give you the pvalues of the 1st variable. You can filter out the names of the variables wherever pvalues[i]<0.05 and then use the list of variable names to filter out data as per need.

See less## Scaling for numeric variables

## Hasnain

It is possible to perform the numerical scaling after the test & train split, but since our Test data is an unseen data .We use it for the further predictions , so we don't want to access it during the training stage . So it is preferred to use it before the train & test split.

It is possible to perform the numerical scaling after the test & train split, but since our Test data is an unseen data .We use it for the further predictions , so we don’t want to access it during the training stage . So it is preferred to use it before the train & test split.

See less## Linear regression with multiple variables

## Suchita

If this particular variable is essential, it should be included in the model. You may go ahead and build the model including this variable. However, you should check the correlation of this variable with other predictor variables. Drop the variable which is highly correlated with this particular varRead more

If this particular variable is essential, it should be included in the model. You may go ahead and build the model including this variable. However, you should check the correlation of this variable with other predictor variables. Drop the variable which is highly correlated with this particular variable. Because highly correlated variables provide the similar information and hence lead to multicollinearity.

Also, after building the model, check if this particular variable is statistically significant or not and take appropriate action in the next version of the model.

Additionally, try Lasso regression. It is an intrinsic method of feature selection and see if this model has included the variable in question.

See less## Intercept in linear regression model

## Suchita

Intercept is the value of y, when all Xs are 0. So the y-intercept is the predicted value of y when all X1, X2, X3,....Xn are zero. In 2-dimensional space, y-intercept is where the regression line cuts the y-axis (value of x=0 at this point). Y-intercept is interpreted as the value of the target y-Read more

Intercept is the value of y, when all Xs are 0. So the y-intercept is the predicted value of y when all X1, X2, X3,….Xn are zero. In 2-dimensional space, y-intercept is where the regression line cuts the y-axis (value of x=0 at this point).

Y-intercept is interpreted as the value of the target y-variable when all the predictors are 0.

For example, if we try to fit a regression line to predict the marks obtained in the test based on no. of hours of studies:

y = 20+.6x, where y is the marks and x is the number of hours of study.

20 is the y-intercept and it means a student will obtain 20 marks, even if he does not study.

However, when this value zero is outside the range of the values of the predictor, used to build the model, the y-intercept will not make much sense in the context of the problem.

Statsmodel , by default fits a line passing through the origin, i.e., there is no y-intercept included by default.

To include y-intercept we use the function add_constant().

See less## Constant in Logistic Regression

## Dipayan Sarkar

When all your independent variables are zero, without an intercept, logistic regression will predict a probability of 1/2. In order to avoid the same, adding an intercept is required. The intercept will help predict a class probability instead of simply returning 1/2 as the predicted probability.

When all your independent variables are zero, without an intercept, logistic regression will predict a probability of 1/2. In order to avoid the same, adding an intercept is required. The intercept will help predict a class probability instead of simply returning 1/2 as the predicted probability.

See less## P-Value in Linear Regression

## Suchita

This answer was edited.If the p-value for f statistic is less than 0.05, we reject the null hypothesis which means that at least one beta coefficient is not zero. We conclude that the over all model is significant. By checking the f-statistic we concluded that the overall model is statistically significant, however we neeRead more

If the p-value for f statistic is less than 0.05, we reject the null hypothesis which means that at least one beta coefficient is not zero. We conclude that the over all model is significant.

By checking the f-statistic we concluded that the overall model is statistically significant, however we need to identify if any of the predictors included in the model are not related to the response variable.

We check the t-statistic to confirm which of the predictor variables are NOT related to the response variable and which of these variables are statistically significant predictors of the response variables.

Alternatively, we may argue that if we are checking the t-statistic of individual predictor variables, and even if one of the predictor variable is a significant, then the overall model should be considered as significant or valid, then why do we check the F-statistic for overall validity of the model.

That’s because 5% of the predictor variables will be significant by sheer chance(@ 95% confidence level). This will be specially true for models with a model with multiple predictor variables. F-statistic does not suffer from this as it adjusts for the number of predictor variable. Hence we confirm the overall validity of the model using F-statistic.

See less## Model Overfitting or Underfitting

## Suchita

This answer was edited.If the model performs well on the training dataset but does not perform well on the test set, it is an indication of overfitting, as the model is unable to generalise on the unseen data. If the model performance is poor on both the training and the test set, then the model is underfitting. It indicaRead more

If the model performs well on the training dataset but does not perform well on the test set, it is an indication of overfitting, as the model is unable to generalise on the unseen data.

If the model performance is poor on both the training and the test set, then the model is underfitting. It indicates that neither the model is able to capture the underlying patterns in the data, and nor is it able to generalise on the unseen data.

See less## Statistical significant difference

## Prasad Valse

The term "statistically significant" simply means that the probability of an event isn't due to a chance and you can feel confident that’s it real, not that you just got lucky. So, when the next time you hear someone saying that a result is statistically significant, you can be "almost" sure that thRead more

The term “statistically significant” simply means that the probability of an event isn’t due to a chance and you can feel confident that’s it real, not that you just got lucky. So, when the next time you hear someone saying that a result is statistically significant, you can be “almost” sure that the results that they observed are reliable. Notice that I have quoted the word ‘almost’ in the previous statement, that’s because statistical significance is based on several parameters:

The confidence levelYour sample sizeLastly, the statistical significance of the results of an experiment is often calculated with hypothesis testing, which tests the validity of a hypothesis by figuring out the probability that your results have happened by chance.

See less## How can I validate user input in Python

## Shivam17

To continue the input instead of getting the program terminated, we need to use the concept of "Try ... Except ..." block. Try... Except... blocks are used for Exception Handling. Whenever an error is occured, the except block handles it. And whenever the input is correct the loop will break. ExamplRead more

To continue the input instead of getting the program terminated, we need to use the concept of “Try … Except …” block. Try… Except… blocks are used for Exception Handling.

Whenever an error is occured, the except block handles it. And whenever the input is correct the loop will break.

Example:

##########################################

while True:

try:

integer_input = int(input(‘Enter a number: ‘))

except:

print(‘Input should be a number !!!’)

else:

print(‘\n\nCorrect Input …’)

break

print(‘The number entered is ‘, integer_input)

##########################################

In the above program, we need an integer/numeric input. If an user enters ‘abcd’, instead of terminating the program, the program will go to

exceptblock and this loop will continue till the user enters a correct input i.e. integer input.As soon as the user enters 43, the loop breaks.

See less