R-square:represents the
proportion of the variance for a dependent variable that's explained by variables in a
regression model.
adjusted R-square
analyze the central tendency of data: Mean, Median and Mode
Central Limit Theorem: If n is large, the sampling distribution of y_bar is approximately normal, regardless of the distribution of y
Hypothesis tests
t-test: test means of 2 sample independent & same variance, two samples are paired and dependent,
F test: compare between-group variability and within-group variability.
(e vs X) residual plot: U shape or inverted U shape non-random, constant variance
Diagnostic Plots
residual vs time: randomly distributed suggests no serial correlation
Q-Q plot checks whether residuals are normally distributed as the points lie on the line y = x.
Residuals vs Fitted plot: the red line suggests that the residuals seem to have no obvious curved pattern, which means trying a model with a quadratic term included will not help.
Assumptions of Linear Regression
- the relationship between the independent and dependent variables to be linear
- all variables to be multivariate normal
- No or little multicollinearity in the data
- little or no autocorrelation in the data. Occurs when the residuals are not independent of each other
- requires variances of residuals to be constant