Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.


Have an account? Sign In Now

Sorry, you do not have a permission to ask a question, You must login to ask question.

Forgot Password?

Need An Account, Sign Up Here

Sorry, you do not have a permission to ask a question, You must login to ask question.

Forgot Password?

Need An Account, Sign Up Here
Sign InSign Up

Algoritmo Lab Forum

Algoritmo Lab Forum Logo Algoritmo Lab Forum Logo

Algoritmo Lab Forum Navigation

  • Forum
  • Algoritmo Lab
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Forum
  • Algoritmo Lab
Home/ Suchita/Answers
Ask Suchita
  • About
  • Questions
  • Polls
  • Answers
  • Best Answers
  • Asked Questions
  • Followed Questions
  • Favorite Questions
  • Groups
  • Posts
  • Comments
  • Followers Questions
  • Followers Answers
  • Followers Posts
  • Followers Comments
  1. Asked: June 17, 2021In: Linear Regression

    Scaling for numeric variables

    Suchita

    Suchita

    • 0 Questions
    • 5 Answers
    • 1 Best Answer
    • 17 Points
    View Profile
    Suchita SME
    Added an answer on June 18, 2021 at 11:35 am

    When we scale the date prior to train-test split,  we cause, indirect data leakage. The algorithm would know the global mean and standard deviation in standardising and global minimum and maximum if doing normalisation. Some information about the hold out sample is captured in the summary statisticsRead more

    When we scale the date prior to train-test split,  we cause, indirect data leakage. The algorithm would know the global mean and standard deviation in standardising and global minimum and maximum if doing normalisation. Some information about the hold out sample is captured in the summary statistics and made available to the model in the training dataset.

    Ideally the transformations should be fit using the training dataset only. Then the transform should be applied on both train and test dataset. This would avoid indirect data leakage and reduce over optimistic results on the train and test dataset.

    For e.g.

    my_scaler = MinMaxScaler()

    my_scaler.fit(X_Train)

    X_Train = my_scaler.transform(X_Train)

    X_Test = my_scaler.transform(X_Test)

    See less
    • 0
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
  2. Asked: June 15, 2021In: Linear Regression

    Linear regression with multiple variables

    Suchita

    Suchita

    • 0 Questions
    • 5 Answers
    • 1 Best Answer
    • 17 Points
    View Profile
    Best Answer
    Suchita SME
    Added an answer on June 16, 2021 at 5:55 am

    If this particular variable is essential, it should be included in the model. You may go ahead and build the model including this variable. However, you should check the correlation of this variable with other predictor variables. Drop the variable which is highly correlated with this particular varRead more

    If this particular variable is essential, it should be included in the model. You may go ahead and build the model including this variable. However, you should check the correlation of this variable with other predictor variables. Drop the variable which is highly correlated with this particular variable. Because highly correlated variables provide the similar information and hence lead to multicollinearity.

    Also, after building the model, check if this particular variable is statistically significant or not and take appropriate action in the next version of the model.

    Additionally, try Lasso regression. It is an intrinsic method of feature selection and see if this model has included the variable in question.

    See less
    • 0
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
  3. Asked: June 16, 2021In: Linear Regression

    Intercept in linear regression model

    Suchita

    Suchita

    • 0 Questions
    • 5 Answers
    • 1 Best Answer
    • 17 Points
    View Profile
    Suchita SME
    Added an answer on June 16, 2021 at 5:41 am

    Intercept is the value of y, when all Xs are 0. So the y-intercept is the predicted value of y when all X1, X2, X3,....Xn are zero. In 2-dimensional space, y-intercept is where the  regression line cuts the y-axis (value of x=0 at this point). Y-intercept is interpreted as the value of the target y-Read more

    Intercept is the value of y, when all Xs are 0. So the y-intercept is the predicted value of y when all X1, X2, X3,….Xn are zero. In 2-dimensional space, y-intercept is where the  regression line cuts the y-axis (value of x=0 at this point).

    Y-intercept is interpreted as the value of the target y-variable when all the predictors are 0.

    For example, if we try to fit a regression line to predict the marks obtained in the test based on no. of hours of studies:

    y = 20+.6x, where y is the marks and x is the number of hours of study.

    20 is the y-intercept and it means a student will obtain 20 marks, even if he does not study.

    However, when this value zero is outside the range of the values of the predictor, used to build the model,  the y-intercept will not make much sense in the context of the problem.

    Statsmodel , by default fits a line passing through the origin, i.e., there is no y-intercept included by default.

    To include y-intercept we use the function add_constant().

    See less
    • 0
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
  4. Asked: January 27, 2021In: Linear Regression

    P-Value in Linear Regression

    Suchita

    Suchita

    • 0 Questions
    • 5 Answers
    • 1 Best Answer
    • 17 Points
    View Profile
    Suchita SME
    Added an answer on January 28, 2021 at 4:45 am
    This answer was edited.

    If the p-value for f statistic is less than 0.05, we reject the null hypothesis which means that at least one beta coefficient is not zero. We conclude that the over all model is significant. By checking the f-statistic we concluded that the overall model is statistically significant, however we neeRead more

    If the p-value for f statistic is less than 0.05, we reject the null hypothesis which means that at least one beta coefficient is not zero. We conclude that the over all model is significant.

    By checking the f-statistic we concluded that the overall model is statistically significant, however we need to identify if any of the predictors included in the model are not related to the response variable.

    We check the t-statistic to confirm which of the predictor variables are NOT related to the response variable and which of these variables are statistically significant predictors of the response variables.

    Alternatively, we may argue that if we are checking the t-statistic of individual predictor variables, and even if one of the predictor variable is a significant, then the overall model should be considered as significant or valid, then why do we check the F-statistic for overall validity of the model.

    That’s because  5% of the predictor variables will be significant by sheer chance(@ 95% confidence level). This will be specially true for models with a model with multiple predictor variables. F-statistic does not suffer from this as it adjusts for the number of predictor variable. Hence we confirm the overall validity of the model using F-statistic.

    See less
    • 0
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
  5. Asked: January 26, 2021In: Linear Regression

    Model Overfitting or Underfitting

    Suchita

    Suchita

    • 0 Questions
    • 5 Answers
    • 1 Best Answer
    • 17 Points
    View Profile
    Suchita SME
    Added an answer on January 27, 2021 at 6:14 am
    This answer was edited.

    If the model performs well on the training dataset but does not perform well on the test set, it is an indication of overfitting, as the model is unable to generalise on the unseen data. If the model performance is poor on both the training and the test set, then the model is underfitting. It indicaRead more

    If the model performs well on the training dataset but does not perform well on the test set, it is an indication of overfitting, as the model is unable to generalise on the unseen data.

    If the model performance is poor on both the training and the test set, then the model is underfitting. It indicates that neither the model is able to capture the underlying patterns in the data, and nor is it able to generalise on the unseen data.

    See less
    • 1
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp

Sidebar

Ask A Question
  • Popular
  • Answers
  • Tags
  • mahima_vaidya

    Multiple Linear Regression

    • 2 Answers
  • Aditya Sharma

    Are there any coding standards in Python?

    • 2 Answers
  • Bikash Ghosh

    Model Overfitting or Underfitting

    • 2 Answers
  • NehaSequeira

    Intercept in linear regression model

    • 2 Answers
  • NehaSequeira

    Scaling for numeric variables

    • 2 Answers
  • Dipayan Sarkar
    Dipayan Sarkar added an answer One of the assumptions of Linear Regression - No multicollinearity.… July 14, 2021 at 4:46 am
  • mahima_vaidya
    mahima_vaidya added an answer 'OLS' object has no attribute 'pvalues' This is the error… July 6, 2021 at 7:24 am
  • Dipayan Sarkar
    Dipayan Sarkar added an answer The statsmodels.regression.linear_model.OLSResults.pvalues should give you the pvalues of the respective… July 5, 2021 at 6:10 pm
  • shreemann
    shreemann added an answer If we remove the intercept then that would make the… June 23, 2021 at 4:53 am
  • Suchita
    Suchita added an answer When we scale the date prior to train-test split,  we… June 18, 2021 at 11:35 am
codingstandards linear regression logistic regression p-value python pythoncoding question

Top Members

Dipayan Sarkar

Dipayan Sarkar

  • 0 Questions
  • 39 Points
SME
Shivam17

Shivam17

  • 0 Questions
  • 29 Points
SME
Prasad Valse

Prasad Valse

  • 0 Questions
  • 28 Points
SME

Explore

  • Recent Questions
  • Feed
  • Most Answered
  • Answers
  • No Answers
  • Most Visited
  • Most Voted

© 2021 Algoritmo Lab. All Rights Reserved