A single value regarded as a sensible choice for \theta
bias(\hat \theta) = E(\hat \theta) - \theta
Var(\hat \theta) = E[(\hat \theta - E(\hat \theta))^2]
Among all unbiased estimators, the one with the smallest variance is called MVUE
MSE(\hat \theta) = E[(\hat \theta - \theta)^2] = Var(\hat \theta) + bias^2(\hat \theta)
Model P = \{f(\cdot, \theta) : \theta \in \Theta\} is called regular if
\frac{\partial}{\partial \theta} \int T(X)f(x; \theta)dx = \int T(X) \frac{\partial}{\partial \theta} f(x; \theta)dx
I(\theta) in a single observation from pdf or pmf f(x; \theta) is the variance of a random variable U = \frac{\partial \ln(f(X; \theta))}{\partial \theta}
I(\theta) = Var[\frac{\partial \ln(f(X; \theta))}{\partial \theta}]
If f(X; \theta) is twice differentiable then,
I(\theta) = -E[\frac{\partial^2 \ln(f(X; \theta))}{\partial \theta^2}]
Let I_n(\theta) be the fisher information for random sample X_1, \cdots, X_n then nI(\theta) = I_n(\theta)
If the statistic T = t(X_1, \cdots, X_n) is an unbiased estimator of \theta then
Var[T] \ge \frac{1}{I_n(\theta)}
The ratio of cramer-rao lower bound to Var[T] is called efficiency. T is called efficient if its efficiency is equal to 1.
Let T_1 and T_2 be unbiased estimators of \theta. Then
eff(T_1, T_2) = \frac{Var[T_2]}{Var[T_1]}
is called the relative efficiency.
The k-th moment is E[X^k]. The k-th sample moment is \frac{1}{n} \sum_{i=1}^n X_i^k
Let X_1, \cdots, X_n be a random sample from f(x; \theta_1, \cdots, \theta_m) where \theta_i are unknown parameters. The moment estimators \hat \theta_1, \cdots, \hat \theta_m are obtained by equaling the first m sample moment to corresponding population moments and solving for \theta_1, \cdots, \theta_m
L(\theta) = f(x_1, \cdots, x_n; \theta_1, \cdots, \theta_m) = \prod_{i=1}^n p(x_i; \theta_1, \cdots, \theta_m)
When x are observed values, L(\theta) is regarded as a function of \theta and called a likelihood function. The maximum likelihood estimates \hat \theta are those values that maximize L(\theta). MLE is the estimator when replacing x_i with X_i
Let \Theta be the unknown parameter, a random variable. We know its f_\Theta(\theta) pdf/pmf (called prior). Then the distribution given the date is
f_{\Theta|X}(\theta|x) = \frac{f_{X|\Theta}(x|\theta)f_\Theta(\theta)}{\int f_{X|\Theta}(x|\theta)f_\Theta(\theta) d\theta}
The pdf f_{\Theta|X}(\theta|x) is called the posterior.
f_{\Theta|X}(\theta|x) \propto f_{X|\Theta}(x|\theta)f_\Theta(\theta)
\hat f_h(x) = \frac{1}{nh} \sum_{i=1}^n K(\frac{x - X_i}{h})
Where K(t) is a Kernel function and h a bandwidth