Linear Regression Error Decomposition
Created on July 16, 2023
Written by Some author
Read time: 1 minutes
Summary: A short derivation of the error decomposition of OLS
$$y_i - \bar{y} = y_i - \hat{y_i} + \hat{y_i} -\bar{y}$$
$$ \sum (y_i-\bar{y})^2 = \sum (y_i - \hat{y_i})^2 + \sum (\hat{y_i} -\bar{y})^2 + 2\sum( y_i - \hat{y_i})(\hat{y_i} -\bar{y})$$
$$( y_i - \hat{y_i})(\hat{y_i} -\bar{y}) = (y_i - b_0 -b_1 x_i) (b_0 +b_1 x_i -\bar{y} ) =(y_i - \bar{y} + \bar{x} b_1 -b_1 x_i) (\bar{y} - \bar{x} b_1 +b_1 x_i -\bar{y} ) $$
$$= (y_i - \bar{y} + (\bar{x} - x_i)b_1) ( b_1 x_i- \bar{x} b_1 ) $$
$$= (y_i - \bar{y} -(x_i -\bar{x}) b_1)b_1 (x_i -\bar{x})$$
$$= b_1(y_i - \bar{y})(x_i -\bar{x}) - b_1^2 (x_i -\bar{x})^2$$
We know that $b_1 = \frac{Cov(X,Y)}{Var(X)}$
so we will have
$$\sum ( y_i - \hat{y_i})(\hat{y_i} -\bar{y}) = \frac{Cov(X,Y)}{Var(X)} Cov(X,Y) - \left(\frac{Cov(X,Y)}{Var(X)}\right)^2 Var(X) = 0$$
We call $\sum(y_i - \bar{y})^2$ total variation without the knowledge of $x$ and $\sum(y_i - \hat{y_i})^2$ total variation remaining with the knowledge of $x$ and $\sum (\hat{y_i} -\bar{y})^2$ total variation explained with the knowledge of $x$