Couple exercises from Sampling Distributions (Part Two)
Created on July 22, 2023
Written by Some author
Read time: 10 minutes
Summary: Chi-squared Distribution, Student's t-distribution, Fisher's F-distribution, Order Statistics.
Chi-squared distribution.
Suppose we have independent standard normal random variables $Z_1, ... Z_n$, then the sum of their squares form a $\chi$-squared distribution with $n$ degree of freedom.
Let's consider the degree of freedom equals $1$.
The standard normal distribution has the following pdf:
$$f(x) = \frac{1}{\sqrt{2\pi}} \exp(-\frac{x^2}{2})$$
Then the $\chi$-squared distribution will have the following pdf:
$$f(x) = \frac{1}{\sqrt{2\pi}}x^{-1/2} \exp(-\frac{x}{2})$$
Now, let's consider the moment generating function:
$$M_X(t) = \mathbb{E}[\exp(Xt)] = \int_0^{\infty} \frac{1}{\sqrt{2\pi}}x^{-1/2} \exp((t-1/2)x) \, dx$$
$$\int_0^{\infty} \frac{1}{\sqrt{2\pi}}x^{-1/2} \exp((t-1/2)x) \, dx$$
Let $u=(t-1/2)x$, $dx = (t-1/2)^{-1} du$, which we will have
$$(t-1/2)^{1/2}(t-1/2)^{-1} \int_0^{\infty} \frac{1}{\sqrt{2\pi}}u^{-1/2} \exp(u)\, du $$
$$= (t-1/2)^{-1/2}$$
And so $X =Z_1^2 +Z_2^2+...+Z_n^2$ will have MGF $ (t-1/2)^{-n/2}$.
In a different post, we will be showing how we can get PDF from MGF using inverse Laplace transform.
We can easily get the variance and mean
$$\mathbb{E}[X] = \sum_i\mathbb{E}[Z_i^2]=n$$
$$\operatorname{Var}[X] = n \operatorname{Var}[Z_1] = n$$
Theorem. Let $X_1, \ldots, X_n \sim N\left(\mu, \sigma^2\right)$ be a random sample, then $\bar{X}$ and $S^2$ are independent, and
$$\frac{(n-1) S^2}{\sigma^2} \sim \chi_{n-1}^2$$
Here, $\bar{X} = \frac{1}{n}\sum_i X_i$ is the sample mean and $S^2 = \frac{1}{n-1}\sum_i (X_i-\bar{X})^2$ is the sample variance.
Lemma. If $X_1, \ldots, X_n \sim N\left(\mu, \sigma^2\right)$ is a random sample, then $\bar{X}$ is independent of $X_i-\bar{X}$ for all $i=1, \ldots, n$.
Let's consider the joint pdf:
$$\frac{1}{(2\pi)^{n/2}\sigma^n} \exp\left(-\frac{1}{2\sigma^2}\sum_{i}(x_i-\mu)^2\right)$$
Consider the following transformation:
$$\bar{X} = Y_1, X_2 -\bar{X} = Y_2, X_3 -\bar{X} = Y_3, ..., X_n - \bar{X} = Y_n.$$
So $X_1 = n\bar{X}-X_2 -X_3-...-X_n = Y_1 - Y_2 - Y_3...-Y_n, X_2 = Y_1+Y_2 , X_3 = Y_1+Y_3, ..., X_n = Y_1 + Y_n. $
and so the determinant is 1 and our joint pdf will be
$$\frac{1}{(2\pi)^{n/2}\sigma^n} \exp\left(-\frac{1}{2\sigma^2}(Y_1 - \sum_{i=2}^n Y_i - \mu)^2\right) \exp\left(-\frac{1}{2\sigma^2}\sum_{i=2}^{n}(Y_1 +Y_i - \mu)^2\right)$$
$$(Y_1 - \sum_{i=2}^n Y_i - \mu)^2 +\sum_{i=2}^{n}(Y_1 +Y_i - \mu)^2$$
$$= (Y_1 - \mu)^2 -2 (Y_1-\mu)\sum_{i=2}^n Y_i +\left(\sum_{i=2}^n Y_i\right)^2 $$
$$+\sum_{i=2}^n(Y_1-\mu)^2 +2(Y_1-\mu)Y_i+Y_i^2$$
$$=n(Y_1 - \mu)^2+ \sum_{i=2}^n Y_i^2+\left(\sum_{i=2}^n Y_i\right)^2$$
So the joint pdf can be split into
$$\frac{1}{(2\pi)^{n/2}\sigma^n} \exp\left(-\frac{n(Y_1-\mu)^2}{2\sigma^2} \right)\exp\left(-\frac{\sum_{i=2}^n Y_i^2+\left(\sum_{i=2}^n Y_i\right)^2}{2\sigma^2} \right)$$
Therefore, $Y_1 $ is independent of $Y_i$ and so we have $\bar{X}$ is independent of $X_i-\bar{X}$ for any $i$.
Now, let us reiterate the theorem and prove:
Theorem. Let $X_1, \ldots, X_n \sim N\left(\mu, \sigma^2\right)$ be a random sample, then $\bar{X}$ and $S^2$ are independent, and
$$\frac{(n-1) S^2}{\sigma^2} \sim \chi_{n-1}^2$$
Since $S^2 = \frac{1}{n-1}\sum_{i}(X_i -\bar{X})^2$ and we know $X_i -\bar{X}$ and $\bar{X}$ are independent, so we will have $\bar{X}$ and $S^2$ are independent.
Now let's consider
$$\sum_i(X_i - \mu)^2 = \sum_i (X_i - \bar{X} +\bar{X} - \mu)^2 = \sum_i (X_i - \bar{X})^2 +(\bar{X} - \mu)^2 + 2(X_i -\bar{X})(\bar{X}-\mu) = \sum_i (X_i - \bar{X})^2 + n(\bar{X} - \mu)^2$$
and
$$\sum_i (X_i - \bar{X})^2 = (n-1)S^2$$
so we will have
$$\sum_i(X_i - \mu)^2 = (n-1)S^2+ n(\bar{X} - \mu)^2$$
If we divide both sides by $\sigma^2$, we will have
$$\sum_i\left(\frac{X_i - \mu}{\sigma}\right)^2 = \frac{(n-1)S^2}{\sigma^2} + \left(\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\right)^2$$
We know that $\sum_i\left(\frac{X_i - \mu}{\sigma}\right)^2$ forming a $\chi$-squared distribution with $n$ degree of freedom and $\left(\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\right)^2$ forming a $\chi$-squared distribution with $1$ degree of freedom so $\frac{(n-1)S^2}{\sigma^2}$ forming a $\chi$-squared distribution with $n-1$ degree of freedom.
The upper $\alpha$-quantile of the chi-squared distribution with $\nu$ degrees of freedom is defined as the following:
$$\Pr[X\ge \chi_{\alpha, \nu}^2] = \alpha$$
Example. Suppose a semiconductor company wants to test the thickness of their semiconductors. They tested a sample of size 20 (assuming the thicknesses are from a normal distribution $N\left(\mu, \sigma^2\right)$ ). The production process is considered "out of control" if $\sigma>0.60$ with probability 0.01 . Suppose the test shows $s=0.84$, is the process out of control?
$$\frac{19 \times 0.84^2}{ 0.6^2} = 37.24$$
and Our critical $\chi^2$-value $\chi_{0.01, 19}^2 = 36.1909$. So the process is out of control.
The student $t$ distribution
Suppose we have a random sample from a normal population $N\left(\mu, \sigma^2\right)$. Can we test the mean $\mu$ without knowing $\sigma^2$ ?
Let $Y \sim \chi^2_{\nu}$ and $Z \sim N(0, 1)$ be independent, then
$T = \frac{ Z}{\sqrt{Y/\nu}}$ has the probability density function given by
$$f_r(t) = \frac{\Gamma((\nu + 1)/2)}{\sqrt{\pi \nu} \Gamma(\nu/2)} (1 + t^2/\nu)^{-(\nu + 1)/2}, \quad \text{for } t \in \mathbb{R}.$$
Here $T$ is said to have the Student distribution with $\nu$ degrees of freedom, i.e., $T \sim t_\nu$.
The joint pdf is the following:
$$\frac{1}{\sqrt{2\pi}} \exp( -\frac{z^2}{2}) \frac{1}{2^{\nu/2}\Gamma(\nu/2)}y^{\nu/2-1}\exp(-y/2)$$
Now let's consider the following transformation:
$(u, v) =(y, \frac{z}{\sqrt{y/\nu}}) $
so $(y, z) = (u, v\sqrt{u/\nu})$
$$\begin{pmatrix}1 & 0 \\ \frac{1}{2}v\nu^{-1/2}u^{-1/2} & \sqrt{u/\nu}\end{pmatrix}$$
So the determinant is $\sqrt{u/\nu}$
and then the original one will become
$$\frac{1}{\sqrt{2\pi\nu}} \exp\left( -\frac{uv^2 }{2\nu}\right) \frac{1}{2^{\nu/2}\Gamma(\nu/2)}u^{(\nu-1)/2}\exp(-u/2)= \frac{1}{\sqrt{2\pi\nu}2^{\frac{\nu}{2}}\Gamma(\nu/2)}u^{\frac{\nu-1}{2}}\exp\left(-\frac{u}{2}(1+\frac{v^2}{\nu})\right)$$
Now let's compute the following marginal pdf:
$$\int_0^{\infty}\frac{1}{\sqrt{2\pi\nu}2^{\frac{\nu}{2}}\Gamma(\nu/2)}u^{\frac{\nu-1}{2}}\exp\left(-\frac{u}{2}(1+\frac{v^2}{\nu})\right) \, du \\= \frac{1}{\sqrt{2\pi\nu}2^{\frac{\nu}{2}}\Gamma(\nu/2)}\int_0^{\infty}u^{\frac{\nu-1}{2}}\exp\left(-\frac{u}{2}(1+\frac{v^2}{\nu})\right) \, du$$
Let $t = \frac{u}{2}(1+\frac{v^2}{\nu})$, $$u = \frac{2t}{1+\frac{v^2}{\nu}}, du = \frac{2dt}{1+\frac{v^2}{\nu}}$$
The marginal pdf will become
$$\frac{2^{\frac{1}{2}}}{\sqrt{2\pi\nu}\Gamma(\nu/2)} \left(\frac{1}{1+\frac{v^2}{\nu}}\right)^{\frac{\nu+1}{2}}\int_0^{\infty} t^{\frac{\nu-1}{2}}\exp(-t)\, dt = \frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\pi\nu}\Gamma(\nu/2)} \left(\frac{1}{1+\frac{v^2}{\nu}}\right)^{\frac{\nu+1}{2}}$$, which proves our theorem.
Theorem. Suppose $\bar{X}$ and $S^2$ are respectively the sample mean and sample variance of a random sample from $N\left(\mu, \sigma^2\right)$, then
$$v=\frac{\bar{X}-\mu}{S / \sqrt{n}} \sim t_{n-1}$$
Proof.
We know that $u=\frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}$ and $\frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \sim N(0,1)$ and so
$$S^2 = \frac{\sigma^2 u}{n-1}, S = \sigma\sqrt{\frac{u}{n-1}}$$
$$\frac{\bar{X}-\mu}{S / \sqrt{n}}= \frac{\bar{X}-\mu}{\sigma\sqrt{\frac{u}{n-1}} / \sqrt{n}} = \frac{v}{\sqrt{\frac{u}{n-1}}}$$
So it's the $t$ distribution with $\nu = n-1.$
Example. Suppose we obtain a random sample of size 16 from a normal population. Using this sample, we figure that $\bar{x}=16.1$ and $s=2.1$. Can we declare that the true mean $\mu>12.0$ with confidence 0.99 ?
$$\frac{16.1-12.0}{2.1/\sqrt{16}} = 7.809$$
The t critical value is $t_{0.005, 15} = 2.946712 $. So we have the confidence.
Fisher F distribution
Theorem. Suppose $U \sim \chi_{\nu_1}^2$ and $V \sim \chi_{\nu_2}^2$ are independent, then
$$ F=\frac{U / \nu_1}{V / \nu_2} $$
has the pdf given by
$$ g(f)= \begin{cases}\frac{\Gamma\left(\frac{\nu_1+\nu_2}{2}\right)}{\Gamma\left(\frac{\nu_1}{2}\right) \Gamma\left(\frac{\nu_2}{2}\right)}\left(\frac{\nu_1}{\nu_2}\right)^{\frac{\nu_1}{2}} \cdot f^{\frac{\nu_1}{2}-1}\left(1+\frac{\nu_1}{\nu_2} f\right)^{-\frac{1}{2}\left(\nu_1+\nu_2\right)}, & \text { if } f>0 \\ 0, & \text { elsewhere. }\end{cases} $$
Here $F$ is said to have the $F$-distribution with degrees of freedoms $\nu_1$ and $\nu_2$, denoted by $F \sim F_{\nu_1, \nu_2}$.
Let's consider the joint pdf:
$$\frac{1}{2^{\nu_1 / 2} \Gamma(\nu_1 / 2)} u^{(\nu_1-2) / 2} e^{-u / 2} \frac{1}{2^{\nu_2 / 2} \Gamma(\nu_2 / 2)} v^{(\nu_2-2) / 2} e^{-v / 2} $$
$$\frac{1}{2^{(\nu_1+\nu_2)/2}\Gamma(\nu_1/2)\Gamma(\nu_2/2)} u^{(\nu_1-2)/2}v^{(\nu_2-2)/2}\exp(-(u+v)/2)$$
Describe the following transformation $(a,b) = (u/\nu_1, v/\nu_2)$, then we will have $(u,v) = (a\nu_1, b\nu_2)$ and so
$$\begin{pmatrix}\nu_1 & 0 \\ 0 & \nu_2\end{pmatrix}$$
and the determinant will be $\nu_1 \nu_2$.
The joint distribution will become
$$\frac{1}{2^{(\nu_1+\nu_2)/2}\Gamma(\nu_1/2)\Gamma(\nu_2/2)} (a\nu_1)^{(\nu_1-2)/2}(b\nu_2)^{(\nu_2-2)/2}\exp(-(a\nu_1+b\nu_2)/2)\nu_1\nu_2$$
$$= \frac{\nu_1^{(\nu_1-2)/2} \nu_2^{(\nu_2-2)/2}}{2^{(\nu_1+\nu_2)/2}\Gamma(\nu_1/2)\Gamma(\nu_2/2)} a^{(\nu_1-2)/2}b^{(\nu_2-2)/2}\exp(-(a\nu_1+b\nu_2)/2)\nu_1\nu_2$$
Let $(x,y)=(b, a/b)$ then we will have $(a,b) = (xy, x)$
$$\begin{pmatrix} y & x \\ 1 & 0\end{pmatrix}$$
And so we will have
$$= \frac{\nu_1^{(\nu_1-2)/2} \nu_2^{(\nu_2-2)/2}}{2^{(\nu_1+\nu_2)/2}\Gamma(\nu_1/2)\Gamma(\nu_2/2)} (xy)^{(\nu_1-2)/2}x^{(\nu_2-2)/2}\exp(-(xy\nu_1+x\nu_2)/2)x\nu_1\nu_2$$
$$ \frac{\nu_1^{\nu_1/2} \nu_2^{\nu_2/2}y^{(\nu_1-2)/2}}{2^{(\nu_1+\nu_2)/2}\Gamma(\nu_1/2)\Gamma(\nu_2/2)} x^{(\nu_1+\nu_2-2)/2}\exp\left(-\frac{x}{\frac{2}{\nu_2+y\nu_1}}\right)$$
Let $t= \frac{x}{\frac{2}{\nu_2+y\nu_1}}$, then $x = \frac{2t}{\nu_2+y\nu_1}$, $dx = \frac{2dt}{\nu_2+y\nu_1}$
The original joint distribution will be
$$\int_0^{\infty}\frac{\nu_1^{\nu_1/2} \nu_2^{\nu_2/2}y^{(\nu_1-2)/2}}{2^{(\nu_1+\nu_2)/2}\Gamma(\nu_1/2)\Gamma(\nu_2/2)} t^{(\nu_1+\nu_2-2)/2}\exp\left(-t\right) \left(\frac{2}{\nu_2 +y \nu_1}\right)^{(\nu_1+\nu_2)/2}$$
$$= \frac{\Gamma(\frac{\nu_1 + \nu_2}{2})}{\Gamma(\nu_1/2)\Gamma(\nu_2/2)} \left(\frac{\nu_1}{\nu_2}\right)^{\nu_1/2} y^{\nu_1/2-1}\left({1 + \frac{\nu_1}{\nu_2}y }\right)^{-(\nu_1+\nu_2)/2} $$
We can say that it's Fisher distribution with degree of freedom $\nu_1$ and $\nu_2$.
Application of $F$ statistics: compare the ratio of $\sigma_1^2$ and $\sigma_2^2$ from two independent normal populations.
Theorem. Suppose there are two independent normal populations with variances $\sigma_1^2$ and $\sigma_2^2$, and $S_1^2$ and $S_2^2$ are the sample variances of two random samples of size $n_1$ and $n_2$ from these two populations. Then
$$F=\frac{S_1^2 / \sigma_1^2}{S_2^2 / \sigma_2^2}=\frac{\sigma_2^2 S_1^2}{\sigma_1^2 S_2^2} \sim F_{n_1-1, n_2-1}$$
Proof. $$u_i=\frac{(n_i-1)S_i^2}{\sigma_i^2} \sim \chi_{n_i-1}^2$$
and so $$F = \frac{u_1/(n_1-1)}{u_2/(n_2-1)} \sim F_{n_1-1,n_2-1}.$$
Order statistics.
The order statistics concerns essentially the $k$-th ordered item will be with certain probability. A formal definition is the following. Suppose we have $X_1, X_2, ... , X_n$ are drawn from a distribution, if we order them, then we will have $Y_1, Y_2, ... Y_n$ where $Y_i$ represents $i$-th smallest element.
Theorem. The pdf $g_r$ of $Y_r$ is given by
$$g_r\left(y_r\right)=\frac{n !}{(r-1) !(n-r) !}\left[\int_{-\infty}^{y_r} f(x) d x\right]^{r-1} f\left(y_r\right)\left[\int_{y_r}^{\infty} f(x) d x\right]^{n-r}$$
for $-\infty<y_r<\infty$.
Proof: See "Introduction to Beta Distribution" in this blog.
From the above, we will have:
• Minimal statistic $Y_1$ has pdf
$f_{Y_1}(y) = n f(y) \left[\int_{y}^{\infty} f(x) d x\right]^{n-1}$
and
• Maximal statistic $Y_n$ has pdf
$f_{Y_n}(y) = n f(y) \left[\int_{-\infty}^{y} f(x) d x\right]^{n-1}$
• If $n = 2m + 1$ is odd, then the sample median $Y_{m+1}$ has pdf
$f_{Y_{m+1}}(y) = \frac{n!}{m! m!}f(y) \left[\int_{y}^{\infty} f(x) d x\right]^{m}\left[\int_{-\infty}^{y} f(x) d x\right]^{m}$
Example. Suppose $X_1, \ldots, X_n$ is a random sample from $\operatorname{Exp}(\theta)$, i.e., the pdf is $f_Y(x)=\frac{1}{\theta} e^{-x / \theta}$, then the pdf of $Y_1$ is
$F_{Y}(x)=\int_0^{x} \frac{1}{\theta}\exp(-y/\theta)\, dy = 1-\exp(-x/\theta)$
So $$f_{Y_1} (y) = \frac{n}{\theta} \left[\exp(-ny/\theta)\right]$$
$$f_{Y_n}(y) = \frac{n}{\theta} \exp(-y/\theta) [1-\exp(-y/\theta)]^{n-1}$$
$$f_{Y_{m+1}}(y) = \frac{(2m+1)!}{m!m!}\frac{n}{\theta} \exp(-y(m+1)/\theta)[1-\exp(-y/\theta)]^{m}$$