Wednesday, 5 December 2018

RegLin 101

Part two of my statistics crash course. I'll show you how to perform regression.

Getting started,
always determine the random variables for both independent and dependent variables. Commonly X is used for the former and Y for the latter.

Also define the hypotheses. Nil hypotheses should be about Independent Variable not affecting the Dependent Variable and vice versa.

Compile and input the data for each variables, do not get switched apart and always remember the commas and compilation notation. You can also plot the data, the independent goes first and dependent goes later.

Determine the correlation and covariance between the two variables. Know that the span of correlation is between one and negative one, and the closer it is to one, the stronger the correlation and more linear it is, the closer it is to negative, the stronger the correlation and more reverse linear it is, and the closer it is to nil, the more lack the correlation there is between them.

Find the model coefficients and the equation. You'll find the coefficients on the regression summary table. Watch for the estimated model coefficients, p-value, and r^2 of the model.
a. craft the model, do it with the dependent variable and independent variable later, connect them with a tilde. Summarise it, you'll get a table showing whatever you need. lm

Plot the model and do an analysis of variance to find the statistical F value, 1st Df, and 2nd Df that you'll need. You will need this to see wether or not Independent Variable affects Dependent Variable. You will need the A-B line command and the Analysis of Variance command on the model.

Compare the F value with the critical F. You will need the alpha, degree of freedom 1, and degree of freedom 2. qf

Same principles, if F value is greater than the critical F, the nil hypothesis is not supported and vice versa. In the case of linear regression, if nil hypothesis is not supported then the Independent Variable does affect the Dependent Variable, and vice versa.

You will also find the fitness coefficient in the form of R-squared, convert it into percentage and the greater it is, the more fitting the model is to the actual data.

Check out the residuals or errors. You will model the residuals, by inserting your regression model into the residual command. Then you can plot the Independent Variable of the residual model, you can input the labels for the axes and the titles, don't forget the quotation marks. You can also plot the whole regression model. You will also need to check the normality of the residual distribution, by modeling the standard residual with the model and plot the standard residual model with a qq plot.


Generate errors and create the boxplot with the following input. Link the model and residuals with a dollar sign, and you can generate every single residuals with it. You can also plot the residuals/errors with a box plot to find wether or not there is/are outliers and determine the mean.


Interpret the results. Based on the regression summary, determine the linear regression equation. Determine if X does affect Y, with the hypotheses written far above. You can check the summary of the regression and see the estimated coefficients for the regression equation. It is also fine to compare the F value and critical F or alpha and p-value obtained from the analysis of variance summary.
c. Determine the fitness of the model, check out the fitness coefficient obtainable from the summary of the regressions. Check out for residual outliers, by plotting the residuals via box plot. Check out for the constant-ness of the residual variance. Check out for the normality of the residuals' distribution.

That is all I can give for the linear regression procedures.

No comments:

Post a Comment