Part two of my statistics crash course. I'll show you how to perform regression.
Getting started,
always
determine the random variables for both independent and dependent
variables. Commonly X is used for the former and Y for the latter.
Also
define the hypotheses. Nil hypotheses should be about Independent
Variable not affecting the Dependent Variable and vice versa.
Compile and input the data for each variables, do not get switched apart and always remember the commas and compilation notation. You can also plot the data, the independent goes first and dependent goes later.
Determine the correlation and covariance between the two variables. Know
that the span of correlation is between one and negative one, and the
closer it is to one, the stronger the correlation and more linear it is,
the closer it is to negative, the stronger the correlation and more
reverse linear it is, and the closer it is to nil, the more lack the
correlation there is between them.
Find the
model coefficients and the equation. You'll find the coefficients on the
regression summary table. Watch for the estimated model coefficients,
p-value, and r^2 of the model.
a. craft the model, do it with the
dependent variable and independent variable later, connect them with a
tilde. Summarise it, you'll get a table showing whatever you need. lm
Plot the model and do an analysis of variance to find the
statistical F value, 1st Df, and 2nd Df that you'll need. You will need
this to see wether or not Independent Variable affects Dependent
Variable. You will need the A-B line command and the Analysis of
Variance command on the model.
Compare the F value with the critical F. You will need the alpha, degree of freedom 1, and degree of freedom 2. qf
Same
principles, if F value is greater than the critical F, the nil
hypothesis is not supported and vice versa. In the case of linear
regression, if nil hypothesis is not supported then the Independent
Variable does affect the Dependent Variable, and vice versa.
You
will also find the fitness coefficient in the form of R-squared,
convert it into percentage and the greater it is, the more fitting the
model is to the actual data.
Check out the residuals or errors. You will model the residuals, by inserting your regression model into the residual command. Then
you can plot the Independent Variable of the residual model, you can
input the labels for the axes and the titles, don't forget the quotation
marks. You can also plot the whole regression model. You
will also need to check the normality of the residual distribution, by
modeling the standard residual with the model and plot the standard
residual model with a qq plot.
Generate errors and create the boxplot with the following input. Link the model and residuals with a dollar sign, and you can generate every single residuals with it. You can also plot the residuals/errors with a box plot to find wether or not there is/are outliers and determine the mean.
Interpret the results. Based on the regression summary, determine the linear regression equation. Determine if X does affect Y, with the hypotheses written far above.
You can check the summary of the regression and see the estimated
coefficients for the regression equation. It is also fine to compare the
F value and critical F or alpha and p-value obtained from the analysis
of variance summary.
c. Determine the fitness of the model, check out the fitness coefficient obtainable from the summary of the regressions. Check out for residual outliers, by plotting the residuals via box plot. Check out for the constant-ness of the residual variance. Check out for the normality of the residuals' distribution.
That is all I can give for the linear regression procedures.
No comments:
Post a Comment