Here be dragons!: 2018

Wednesday, 5 December 2018

RegLin 101

Part two of my statistics crash course. I'll show you how to perform regression.

Getting started,
always determine the random variables for both independent and dependent variables. Commonly X is used for the former and Y for the latter.

Also define the hypotheses. Nil hypotheses should be about Independent Variable not affecting the Dependent Variable and vice versa.

Compile and input the data for each variables, do not get switched apart and always remember the commas and compilation notation. You can also plot the data, the independent goes first and dependent goes later.

Determine the correlation and covariance between the two variables. Know that the span of correlation is between one and negative one, and the closer it is to one, the stronger the correlation and more linear it is, the closer it is to negative, the stronger the correlation and more reverse linear it is, and the closer it is to nil, the more lack the correlation there is between them.

Find the model coefficients and the equation. You'll find the coefficients on the regression summary table. Watch for the estimated model coefficients, p-value, and r^2 of the model.
a. craft the model, do it with the dependent variable and independent variable later, connect them with a tilde. Summarise it, you'll get a table showing whatever you need. lm

Plot the model and do an analysis of variance to find the statistical F value, 1st Df, and 2nd Df that you'll need. You will need this to see wether or not Independent Variable affects Dependent Variable. You will need the A-B line command and the Analysis of Variance command on the model.

Compare the F value with the critical F. You will need the alpha, degree of freedom 1, and degree of freedom 2. qf

Same principles, if F value is greater than the critical F, the nil hypothesis is not supported and vice versa. In the case of linear regression, if nil hypothesis is not supported then the Independent Variable does affect the Dependent Variable, and vice versa.

You will also find the fitness coefficient in the form of R-squared, convert it into percentage and the greater it is, the more fitting the model is to the actual data.

Check out the residuals or errors. You will model the residuals, by inserting your regression model into the residual command. Then you can plot the Independent Variable of the residual model, you can input the labels for the axes and the titles, don't forget the quotation marks. You can also plot the whole regression model. You will also need to check the normality of the residual distribution, by modeling the standard residual with the model and plot the standard residual model with a qq plot.

Generate errors and create the boxplot with the following input. Link the model and residuals with a dollar sign, and you can generate every single residuals with it. You can also plot the residuals/errors with a box plot to find wether or not there is/are outliers and determine the mean.

Interpret the results. Based on the regression summary, determine the linear regression equation. Determine if X does affect Y, with the hypotheses written far above. You can check the summary of the regression and see the estimated coefficients for the regression equation. It is also fine to compare the F value and critical F or alpha and p-value obtained from the analysis of variance summary.
c. Determine the fitness of the model, check out the fitness coefficient obtainable from the summary of the regressions. Check out for residual outliers, by plotting the residuals via box plot. Check out for the constant-ness of the residual variance. Check out for the normality of the residuals' distribution.

That is all I can give for the linear regression procedures.

ANOVA 101

G'day, it's been a while.

This one will not be like my usual logs, but I do hope it is useful.

One of the thing men of science need to do is research, but quite often people obtain a lot of data from multiple samples and multiple treatments. I thought of it and commonly they wanted to know if different treatments are the same or different from one another. This can be done with an analysis of variance, which is tedious if you have to do it with a pen and a sheet of paper, so I thought of seeking a more "updated" method to clear this up.

Be aware that this program requires a lot of coding for the script input.

Getting started,
define the random variable and hypotheses of the analysis of variance, let's say
X would be the j-th data observed from the i-th treatment.

The hypotheses, the nil hypothesis basically is when there are no differences between the treatments based on the variance of each, and for the counterpart hypothesis, there would be at least one average of a treatment that is significantly different.

You don't need to "link" a separate data sheet, you can also put them manually should it be not that many.

In this case, let's say I have k treatments and n observations for each, create a matrix for them. Name the table with anything define the matrix as the Predictor. Insert the values as if it is a table, but sideways on the script. Define the number of rows (downwards), and columns (sideways). Pay attention to those commas.

Name the transformed data with anything you like. Input the Compiled Table and transform it as a matrix. Remember to input the name of the table the last. Do not forget the compilation and table notation.

Define new variables for the treatments and the amount of observations/samples. The variable factors for each treatment you can name them with anything, and compile them. Determine how many treatments, observations per treatment, and all observations there are. You can use any letters to code it. Do not forget the compilation notation. Determine the 1st and 2nd degrees of freedom. The first is the number of treatments subtracted with one, and the second is the amount of all observations subtracted with the amount of treatments. Make the treatment factors is compatible with the data vector. You will need the number of treatments, 1 to signify difference, the whole number observations and the factors you have listed on the first step. get lit.
Create the Analysis of Variance summary table. Determine the variable for the analysis of variance you're getting. Input the code and connect the data vector and the factor vector with a tilde and summarise it. Compare the obtained F value from prior step with a critical F. For the latter, you'll need the alpha, degree of freedom 1, and degree of freedom 2 for the inputs. Do not forget the compilation notation. Compare the F value and critical F. Note that if F value > critical F and P-value < alpha, H0 is not supported, thus showing difference between treatments. The vice versa works as well if treatments results showed no differences.

Quite often, linear regression is also often used. I'll cover that on the next post.

Monday, 1 October 2018

Strong and BIG

Well, well... this one will be short...

but I'm just letting myself (and anyone who read this) that I've joined an agriculture technology idea competition hosted by JAPFA Comfeed and Vaksindo.

Hoping that this will go very well...

here it is, what I did a few hours ago, it was sent to them!