Introduction to different moments of distributions (Planned)
Created on July 30, 2023
Last modified on August 05, 2023
Written by Some author
Read time: 7 minutes
Summary: We will get into how to get different moments of different distributions and possible skewness and kurtosis of different distributions. We will also discuss the relationship between different moments and the moment generating function.
Understanding Moments
We already have the following for the moment generating function.
If $X_i$ has moment generating function $f_{X_i}(z;i)$, then we will have
$f_{\alpha X_i}(z;i) = \mathbb{E}_{X_i}[\exp(\alpha X_iz)] = \mathbb{E}_{X_i}[\exp(X_i (\alpha z))] = f_{X_i}(\alpha z; i)$
and
$f_{\alpha + X_i}(z;i) = \mathbb{E}_{X_i}[\exp((\alpha+X_i)z)] = \exp(\alpha z)\mathbb{E}_{X_i}[\exp(X_iz)] = \exp(\alpha z) f_{X_i}(z).$
and
$f_{\sum_i X_i}(z;i) = \mathbb{E}_{\sum_i X_i} [\exp(z\sum_i X_i)] = \prod_i \mathbb{E}_{X_i}[\exp(X_iz)] = \prod_i f_{X_i}(z).$
Now, let's move beyond these constant operations or adding and consider some more interesting operations.
Compounding pgf:
If $X$ has pgf $P_X(z)$ and $Y$ has pgf $P_Y(z)$, then their compounding distribution denote as $Z = X \times Y$ will have the following:
$$P_Z(z) = P_X(P_Y(z)).$$
We have already seen the following properties:
$$M_X(z) = P_X(\exp(z)).$$
So
$$M_Z(z) = P_{Z}(\exp(z)) = P_X(P_Y(\exp(z)))= P_X(M_Y(z)).$$
Now, let's consider some of the moments from a distribution.
We know that a distribution can have their first moment calculated by taking their moment generating function's first derivative and setting the variable to 1. And the second moment is just second derivative evaluated at 1.
We will denote a distribution's i-th moment as $Z_i$ from now on for any distribution $Z$.
Then the skewness $\gamma_1$ can be expressed as $$\gamma_1[Z] :=\frac{Z_3}{(Z_2 -Z^2)^{3/2}}$$
and kurtosis $\text{Kurt}$ can be expressed as
$$\text{Kurt}[Z] = \frac{Z_4}{(Z_2 - Z^2)^2}.$$
There's also an important concept of cumulant. Cumulants are important statistical concepts because the first, second, third cumulants provide an alternative way of computing the mean, variance and third central moments. However, we will have trouble with higher as they do not coincide with the central moments.
A definition of cumulant generating function is the following:
$$K_X(z) = \log (M_X(z)).$$
And we can use Maclaurin series to expand the summation and get the coefficients.
Now, let's consider several well-known distributions' moment generating function and skewness and kurtosis.
$\textbf{Normal Distribution}$
First, let's consider normal distribution:
$$f_Z(z) = (2\pi b^2)^{-1/2} \exp\left(-\frac{1}{2}(z-a)^2/b^2\right)$$
Then it will have the following mgf:
$$\mathbb{E}_Z[\exp(Zt)] = \int_{-\infty}^{\infty} \exp(zt)(2\pi b^2)^{-1/2} \exp\left(-\frac{1}{2}(z-a)^2/b^2\right) \, dz\\=(2\pi b^2)^{-1/2}\int_{-\infty}^{\infty} \exp\left(zt -\frac{1}{2}(z-a)^2/b^2 \right)\, dz$$
If we are attentive on $zt -\frac{1}{2}(z-a)^2/b^2 $, we will have
$$tz - \frac{1}{2}\frac{z^2 +a^2 -2az}{b^2} = -\frac{1}{2b^2} \left(-2b^2 tz+z^2 +a^2 -2az\right)= -\frac{1}{2b^2}(z^2 -(2a+2b^2t)z+a^2)$$
Notice that $(a+b^2t)^2 = a^2 + 2ab^2 t +b^4 t^2$ and so
$$tz - \frac{1}{2}\frac{z^2 +a^2 -2az}{b^2} = -\frac{1}{2b^2}(z^2 - (2a+2b^2t)z+(a+2b^2t)^2 -2ab^2t -b^4 t^2) = -\frac{1}{2b^2}(z-a-2b^2t)^2 +at +\frac{1}{2}b^2 t^2$$
And so $$\mathbb{E}_Z[\exp(Zt)] =(2\pi b^2)^{-1/2}\int_{-\infty}^{\infty} \exp\left(-\frac{1}{2b^2}(z-a-2b^2t)^2 +at +\frac{1}{2}b^2 t^2 \right)\, dz\\=\exp\left( at +\frac{1}{2}b^2 t^2 \right)(2\pi b^2)^{-1/2}\int_{-\infty}^{\infty} \exp\left(-\frac{1}{2b^2}z^2\right) \, dz \\= \exp\left( at +\frac{1}{2}b^2 t^2 \right)\pi^{-1/2} \int_{-\infty}^{\infty} \exp(-z^2)\, dz $$
We know that $\int_{-\infty}^{\infty}\exp(-z^2)\,dz = \sqrt \pi$ so we will have the moment generating function for normal distribution to be $\exp\left( at +\frac{1}{2}b^2 t^2 \right)$.
Now, let's consider normal distribution's cousin chi-squared distribution (which is one transformation of normal distribution).
$\chi^2\textbf{ distribution}$
We know the definition of chi-squared distribution follows the squared of a normal distribution e.g. if $X$ is a $\chi^2$ distribution, then we will have
$$X =Z^2.$$
where $Z$ is a standard normal distribution.
Then its pdf will have the following form:
$$(2\pi z)^{-1/2}\exp\left(-\frac{z}{2}\right)$$
Generalized moments.
Suppose we have a function $S(z;\theta)$ that is differential or smooth, the function can be a moment generating function or any other function of any kind with hyperparameter $\theta$, then we can define the expected value of the function as $\partial_z S(z;\theta) \mid_{z = 0}$ and the second moment of the function as $\partial^2_z S(z;\theta) \mid_{z=0} $ and the $n$-th moment as $\partial_z^n S(z;\theta)$.
Example. Suppose we have the following function with hyperparameters $m, k$ defined as $$I(m,k) = \left[1-\beta(\exp(z)-1)\right]^m \exp(kz)$$
Then we will have
$$\partial_z I(m,k) = -\beta m\left[1-\beta(\exp(z)-1)\right]^{m-1} \exp((k+1)z) + k\left[1-\beta(\exp(z)-1)\right]^m \exp(kz)$$
and so we will have
$$\partial_z I(m,k) = -\beta m I(m-1, k+1) + k I(m,k)$$
We can even get a ordinary differential equation from this and we even know the solution already.
$$\partial_z f - k f = -\beta m I(m-1, k+1)$$
whose solution is $[1-\beta (\exp(z)-1)]^m \exp(kz).$
Now let's compute the expected value $M_1$ of the function $I(-r,0)$, we will get
$$M_1=\partial_z I(-r,0) \mid_{z =0} = \beta rI(-r-1, 1) \mid_{z =0} = \beta r $$
and its second moment $M_2$ will be
$$M_2=\partial^2_z I(-r,0) \mid_{z=0}= \beta r\partial_z I(-r-1, 1)\mid_{z=0} = \beta r (\beta(r+1)+1) = \beta r(\beta r + \beta + 1)$$
And so the variance in the ordinary sense will be
$$\beta r(\beta r + \beta+1)-\beta^2 r^2 =\beta r(\beta+1)$$
Now if we compute the third moment $M_3$, we will get the following:
$$M_3 = \partial^3_z I(-r,0) \mid_{z=0} = \beta r\partial^2_z I(-r-1, 1)\mid_{z=0} \\= \beta r \partial_z (\beta (r+1)I(-r-2 , 2)+I(-r-1,1))\mid_{z=0} \\ =\beta r [\beta (r+1)\partial_zI(-r-2 , 2)+\partial_zI(-r-1,1)) \mid_{z=0} \\=\beta r[\beta (r+1) (\beta (r+2)+2) +\beta r + \beta + 1]\\ = \beta r [\beta^2 (r+1) (r+2) + 3 \beta (r+1) +1 ] \\= \beta^3 r(r+1)(r+2) +3 \beta^2 r(r+1)+3\beta r -2\beta r \\= \beta^3 r(r+1)(r+2) +3M_2 -2M_1$$
Notice $(M_2 - M_1)^2 = (\beta r \beta (r+1))^2 = \beta^4 r^2 (r+1)^2$ so $\beta^3 r (r+1)^2 = \frac{(M_2-M_1)^2}{M_1}$
and so we will have
$$M_3 = \frac{(M_2-M_1)^2}{M_1} \frac{r+2}{r+1} +3 M_2 - 2M_1.$$
A more concrete case of the above can be attributed to the ETNB, notice that ETNB's mgf has the following form:
$$M_X(z)=\frac{1}{(1+\beta)^r-1}\left(\left(\frac{1+\beta}{1-\beta(\exp(z)-1)}\right)^r-1\right) \\=C_1\left(\left(\frac{1+\beta}{1-\beta(\exp(z)-1)}\right)^r\right) -C_2 \\= C_3 (1 -\beta(\exp(z)-1))^{-r}-C_2\\= C_3 I(-r,0)-C_2$$
So it's essentially the same as $I(-r,0)$ except with constant scaling and extra constant term. Since we will see shortly that the constant terms won't affect us with our computation.
$\textbf{Theorem. } 1$ If $M_k$ is $k$-th moment of a function $g$, then $\alpha M_k$ is the $k$-th moment of $\alpha g + \beta$ where $\alpha, \beta$ are constants.
$\textbf{Proof.}$
We observe that $\partial_t^k (\alpha g+\beta) = \alpha \partial_t^k g = \alpha M_k$
We can use theorem $1$. to show the above relation
$$M_3 = \frac{(M_2-M_1)^2}{M_1} \frac{r+2}{r+1} +3 M_2 - 2M_1.$$
still holds for the case of ETNB.
More specifically, consider $N_3 = C_3 M_3$ and $N_2 = C_3 M_2$ and $N_1 = C_3 M_1$, then we will have $M_3 = \alpha N_3$, $M_2 = \alpha N_2$, $M_1 = \alpha N_1$ for $\alpha = \frac{1}{C_3}$.
$$\alpha N_3 = \frac{\alpha^2(N_2-N_1)^2}{\alpha N_1} \frac{r+2}{r+1} +3 \alpha N_2 - 2 \alpha N_1 \implies N_3 = \frac{(N_2-N_1)^2}{ N_1} \frac{r+2}{r+1} +3 N_2 - 2 N_1$$
$\textbf{Example 2.}$ consider $I(n) = (a+2bt)^n \exp(at +bt^2) $,
then $\partial_t I(n) = n (2b) (a+2bt)^{n-1} \exp(at +bt^2) + (a+2bt)^{n+1} \exp(at+bt^2) = 2bn I(n-1) + I(n+1) $
so $M_1 = \partial_t I(0) \mid_{t=0} = I(1) \mid_{t=0} = a$
$M_2 =\partial^2_t I(0) \mid_{t=0}= \partial_t I(1) \mid_{t=0} = \left(2b I(0) + I(2)\right) = 2b +a^2 =\sigma^2 +a^2 $
$M_3 = \partial_t^3I(0) \mid_{t=0} = \partial_t\left(2b I(0) + I(2) \right) \mid_{t=0} $
Notice that $\partial_t I(0) \mid_{t=0} = a$
and $\partial_t I(2) \mid_{t=0} = 4bI(1)+I(3) = 4ab+a^3$
so we will have $M_3 = 6ab + a^3 = a(6b+a^2)$
Notice that the central 3rd moment of this distribution has the following:
$$M_3 -3 M_2 M_1 +2 M_1^3 = a^3 +6ab -3(a^2 +2b)a +2a^3=0$$
So that means the skewness of this "distribution" is zero.
We can get the distribution by using inverse Laplace transform (for characteristic function, we can use inverse Fourier transform)