Data obtained when observing two characteristics of one object
(x_1, y_1), (x_2, y_2), \cdots, (x_n, y_n)
Linear relationship
s_{xy} = \frac{1}{n-1}\sum_{i=1}^n(x_i - \bar x)(y_i - \bar y) = \frac{1}{n-1}(\sum_{i=1}^nx_iy_i - n\bar x\bar y)
r_{xy} = \frac{s_{xy}}{s_xs_y}
Note that -1 \le r_{xy} \le 1
Measure of monotonicity
\rho_{xy} = \frac{s_{r(x)r(y)}}{s_{r(x)}s_{r(y)}}
Where r(x) denotes the rank of x
If there are no ties:
\rho_{xy} = 1 - \frac{6\sum_{i=1}^n d_i^2}{n(n^2 - 1)}
Where d_i = r(x_i) - r(y_i)
y = f(x)
y = f(x) + \varepsilon
Y_i = \beta_0 + \beta_1x_i + \varepsilon_i
Regression line \hat y = \hat \beta_0 + \hat \beta_1 x can be estimated using least squares. These estimators are:
Unbiased estimator for \sigma^2: s_R^2 = \frac{1}{n-2} \sum_{i=1}^n(y_i - \hat y_i)^2
\hat \beta_0 \sim N(\beta_0, \sqrt{\sigma^2(\frac{1}{n} + \frac{\bar x^2}{(n-1)s_X^2})})
\frac{\hat \beta_0 - \beta_0}{\sqrt{s_R^2(\frac{1}{n} + \frac{\bar x^2}{(n-1)s_X^2})}} \sim t_{n-2}
\hat \beta_1 \sim N(\beta_1, \sqrt{\frac{\sigma^2}{(n-1)s_X^2}})
\frac{\hat \beta_1 - \beta_1}{\sqrt{\frac{s_R^2}{(n-1)s_X^2}}} \sim t_{n-2}
Goodness of fit
R^2 = r^2_{(x,y)}
is the percentage of the sample variability in the y variable is explained by the model (linear dependence on x). Higher the better.
\begin{aligned} \text{Total sum of squares (SST)} &= \text{Residual sum of squares (SSR)} &&+ \text{Model sum of squares (SSM)} \\ \sum_i (y_i - \bar y)^2 &= \sum_i (y_i - \hat y)^2 &&+ \sum_i (\hat y - \bar y)^2 \\ \end{aligned}
then
R^2 = 1 - \frac{\text{SSR}}{\text{SST}} = \frac{\text{SSM}}{\text{SST}}