Can a characteristic of a data set make a linear regression model unusable?
1 Answer
Non normality and/or heterogeneity of the data
Explanation:
When you want to apply a model, you start, in general, to apply a linear model. Once you've done model selection (choose the best one for what you want to do), you have to validate your model.
For this, first make a quantile-quantile plot. If the residuals follow a linear pattern (as below), you can assume the normality of your data.
Second, make the plot of the residuals vs the fitted values. If you see a pattern or a cone shape of the residuals, you have either a non-linear effect of one of your variable or heterogeneity. On the graph below, we can assume homogeneity.
Plot the residuals vs each variable to investigate the same pattern/cone shape as mentioned above.
If you have heterogeneity you can try a generalized linear model (glm). If you find a non-linear pattern, you can try a generalize additive model (gam).