Introduction to Beta Distribution

Created on July 08, 2023

Last modified on July 22, 2023

Written by Some author

Read time: 10 minutes

Summary: The beta distribution is a probability distribution with a probability density function that can be described using the Beta function, and it is commonly encountered in the context of order statistics.

Introducing the beta distribution.

Beta distribution follows from the Beta function by parameterization. We know that beta distribution is defined in terms of Gamma function by the following:

$$\mathrm{B}(X,Y) = \frac{\Gamma(X) \Gamma(Y)}{\Gamma(X+Y)}$$

And so we can define naturally the beta distribution:

$$f_X(x) = \frac{1}{B(a, b)}x^{a-1}(1-x)^{b-1}$$

with support $(0,1)$.

Beta distribution exists widely with order statistics, specifically, if we have $n$ i.i.d random variables, we want to know how likely one of the variables (after sorted) is in terms of its value. We will encounter beta distribution.

We can first visualize the beta distribution from visualization provided.

Alpha:

Beta:

If adjust the parameter $a,b$, we will find the skewness of the shape changes.

Specifically, when $a = b$, the skewness remains the same. However, when $a > b$, we will have the shape being left skewed (meaning more concentration on the right). And when $a < b$, we will have the distribution to be more right skewed.

Now, let's dive into important characteristics of the beta distribution.

We will borrow the proof provided by statproofbook

We know the definition of moment-generating function will be

$$M_X(t) = \mathbb{E}[e^{tX}] = \int_0^1 \frac{1}{\mathrm{B}(a, b)}\exp(tx)x^{a-1}(1-x)^{b-1}\, dx$$

$$= \frac{1}{\mathrm{B}(a, b)}\int_0^1\exp(tx)x^{a-1}(1-x)^{b-1}\, dx$$

We know the integral representation of the confluent hypergeometric function is the following:

$${}_1F_1(a,b,z) = \frac{1}{B(a, b-a)} \int_0^1 \exp(zx) x^{a-1}(1-x)^{b-a-1}\, dx$$

And so we can write the mgf as $$M_X(t) = {}_1 F_1(a, a+b, t)$$

We know that the series equation for confluent hypergeometric function is

$${}_1 F_1(a,b,z) = \sum_{n=0}^{\infty} \frac{a^{\overline{n}}}{b^{\overline{n}}} \, \frac{z^n}{n!} $$

where $$m^{\overline{n}} = \prod_{i=0}^{n-1} (m+i)$$,

So we will have

$${}_1 F_1(a, a+b, t) = \sum_{n=0}^{\infty} \frac{a^{\overline n}}{(a+b)^{\overline n}} \frac{t^n}{n!} = 1+\sum_{n=1}^{\infty} \frac{\prod_{i=0}^{n-1}(a+i)}{\prod_{i=0}^{n-1}(a+b+i)} \frac{t^n}{n!} = 1+\sum_{n=1}^{\infty} \prod_{i=0}^{n-1} \frac{a+i}{a+b+i} \frac{t^n}{n!} = 1 + \sum_{n=1}^{\infty} \frac{\Gamma(a+n)\Gamma(a+b)}{\Gamma(a)\Gamma(a+b+n)\Gamma(n+1)} t^n$$

To calculate the $k$th derivative of the expected value $\mathbb{E}[e^{Xt}]$ with respect to $t$, we need to use the properties of the moment generating function (MGF) and its derivatives.

The MGF of a random variable $X$ is defined as $\mathbb{E}[e^{Xt}]$. The $k$th derivative of the MGF evaluated at $t=0$ is related to the $k$th raw moment of $X$. In general, the $k$th derivative of the MGF evaluated at $t=0$ can be expressed as follows:

$$ \frac{d^k}{dt^k}\mathbb{E}[e^{Xt}] \bigg|_{t=0} = \mathbb{E}[X^k] $$

So, the $k$th derivative of the expected value $\mathbb{E}[e^{Xt}]$ with respect to $t$ evaluated at $t=0$ is equal to the $k$th raw moment of the random variable $X$.

We know that the series equation for confluent hypergeometric function is

$${}_1 F_1(a,b,z) = \sum_{n=0}^{\infty} \frac{a^{\overline{n}}}{b^{\overline{n}}} \, \frac{z^n}{n!} $$

where $$m^{\overline{n}} = \prod_{i=0}^{n-1} (m+i)$$,

So we will have

We also know that $\frac{d^k}{dt^k}\mathbb{E}[e^{Xt}] \bigg|_{t=0} = \mathbb{E}[X^k]$, so we have $$\mathbb{E}[X]= \frac{\Gamma(a+1)\Gamma(a+b)}{\Gamma(a)\Gamma(a+b+1)} = \frac{a}{a+b}$$

And $$\mathbb{E}[X^2] = \frac{\Gamma(a+2)\Gamma(a+b)}{\Gamma(a) \Gamma(a+b+2)} = \frac{a(a+1)}{(a+b)(a+b+1)} = \frac{a^2+a}{a^2 + 2ab + a + b^2 + b}$$

$$\mathbb{E}[X^3] = \frac{a(a+1)(a+2)}{(a+b)(a+b+1)(a+b+2)}= \frac{(a^2+a)(a+2)}{(a^2+2ab+a+b^2+b)(a+b+2)} = \frac{2 a + 3 a^2 + a^3}{a^3 + 3 a^2 b + 3 a^2 + 3 a b^2 + 6 a b + 2 a + b^3 + 3 b^2 + 2 b}$$

Note that the variance is calculated as

$$Var(X) = \mathbb{E}[X^2]-(\mathbb{E}[X])^2 = \frac{a(a+1)}{(a+b)(a+b+1)} - \frac{a^2}{(a+b)^2}= \frac{a(a+1)(a+b)-a^2(a+b+1)}{(a+b)^2(a+b+1)} = \frac{ab}{(a+b)^2 (a+b+1)}$$

We now consider a generic distribution's moments, specifically, we consider a distribution with the following mgf:

$$M_X(t) = \sum_{n=0}^{\infty} \frac{\alpha(n)}{\Gamma(n)} t^n$$,

where $\alpha(n)$ is $n$-th moment of the distribution.

A speedy way of computing the skewness/kurtosis is by considering solely $$\mathbb{E}\left[\exp\left(\frac{(X-\mu)}{\sigma}t \right)\right] = \exp(-\mu t) M_X(t) = \left(\sum_{n=0}^{\infty} \frac{(-1)^n\mu^n }{\Gamma(i+1)\sigma^n}t^n\right)\left( \sum_{n=0}^{\infty} \frac{\alpha(n)}{\Gamma(n)\sigma^n} t^n\right)$$

Let $u(k):=\frac{d^k}{dt^k} \mathbb{E}[e^{(X-\mu)t}]\bigg|_{t=0}$

So $$u(3) = \frac{6}{\sigma^3}\times\left( -\frac{\mu^3}{6} + \frac{\mu^3}{2} - \frac{\sigma^2 +\mu^2}{2} \mu + \frac{\alpha(3)}{6} \right) = \frac{\alpha(3) - 3\mu \sigma^2 - \mu^3}{\sigma^3}$$

And

$$u(4) = \frac{24}{\sigma^4} \left( \frac{\mu^4}{24} - \frac{\mu^4}{6} + \frac{\mu^2 (\sigma^2 + \mu^2)}{4} - \frac{\mu \alpha(3)}{6}+ \frac{\alpha(4)}{24}\right)$$

$$ = \frac{ 6\mu^2\sigma^2 + 3\mu^4 - 4\mu \alpha(3)+\alpha(4)}{\sigma^4}$$

Now let's compute the skewness of our beta distribution

So we will focus on $$\frac{\alpha(3) - 3\mu \sigma^2 - \mu^3}{\sigma^3}$$

We know that $\mu = \frac{a}{a+b}$ and $\sigma^2 = \frac{ab}{(a+b)^2(a+b+1)}$ and we also have $\alpha(3) = \frac{a (a+1) (a+2) }{ (a+b) ( a+ b+ 1) (a+b+2)}$, so we will have the following as the original:

$$\frac{\frac{a(a+1)(a+2)}{(a+b)(a+b+1) (a+b+2)} - \frac{3a^2 b}{(a+b)^3 ( a+b+1)}- \frac{a^3}{(a+b)^3}}{\frac{ab}{(a+b)^3 (a+b+1)} \sqrt{\frac{ab}{ a+b+1}}}$$

$$= \frac{\frac{(a+1)(a+b+2) - b(a+1)}{a+b+2} - \frac{3a b}{(a+b)^2 }- \frac{a^3+a^2b+a^2}{(a+b)^2}}{\frac{b}{(a+b)^2 } \sqrt{\frac{ab}{ a+b+1}}}$$

$$= \frac{(a+1)(a+b)^2 - \frac{b(a+1)(a+b)^2}{a+b+2}- 3ab -a^3-a^2b-a^2}{b \sqrt{\frac{ab}{ a+b+1}}}$$

$$= \frac{a(a + b)-a+b- \frac{(a+1)(a+b)^2}{a+b+2}}{\sqrt{\frac{ab}{ a+b+1}}}$$

$$= \frac{a(a + b)(a+b+2)+ (b-a)(a+b+2)- (a+1)(a+b)^2}{(a+b+2)\sqrt{\frac{ab}{ a+b+1}}}$$

$$= \frac{(a^2+ab + a + b)(a+b) +2(b-a) - (a+1)(a+b)^2}{(a+b+2)\sqrt{\frac{ab}{ a+b+1}}} = \frac{2(b-a) \sqrt{a+b+1}}{(a+b+2) \sqrt{ab}}$$

Now let's see if we can deal with kurtosis:

Recall that $$u(4) = \frac{ 6\mu^2\sigma^2 + 3\mu^4 - 4\mu \alpha(3)+\alpha(4)}{\sigma^4}$$

We also have $$u(3) = \frac{\alpha(3) - 3\mu \sigma^2 - \mu^3}{\sigma^3}$$So iteratively, we will have $\alpha(3) = u(3) \sigma^3 + \mu^3 + 3\mu\sigma^2$

$$u(4) = \frac{ 6\mu^2\sigma^2 + 3\mu^4 - 4\mu (u(3) \sigma^3 + \mu^3 + 3\mu\sigma^2)+\alpha(4)}{\sigma^4}$$

$$u(4) = \frac{ \alpha(4)- 4\mu u(3) \sigma^3 - \mu^4 - 6\mu^2 \sigma^2}{\sigma^4}$$

The above derivation is for a generic distribution, now we will focus on the beta distribution, in fact, we know that $\alpha(4) = \alpha(2) \times \frac{(a+2)(a+3)}{(a+b+2)(a+b+3)} = (\mu^2 + \sigma)\frac{(a+2)(a+3)}{(a+b+2)(a+b+3)}$, actually, it doesn't provide too much value.

It means the following:

$$u(4) = \frac{\frac{a(a+1)(a+2)(a+3)}{(a+b)(a+b+1)(a+b+2)(a+b+3)} - 4\frac{a}{a+b} (\frac{ab}{(a+b)^3 (a+b+1)}) \frac{2(b-a) }{(a+b+2) }-\frac{a^4}{(a+b)^4}-6\frac{a^2 ab}{(a+b)^2(a+b)^2 (a+b+1)}}{\left(\frac{ab}{(a+b)^2(a+b+1)}\right)^2}$$

$$u(4) = \frac{\frac{a(a+1)(a+2)(a+3)(a+b)^3}{(a+b+2)(a+b+3)} - \frac{8a^2b(b-a)}{(a+b+2)} -a^4 (a+b+1)-6a^2 ab}{\left(\frac{a^2b^2}{(a+b+1)}\right)}$$

$$u(4) = \frac{\frac{a(a+1)(a+2)(a+b+3)(a+b)^3 - a(a+1)(a+2)b(a+b)^3}{(a+b+2)(a+b+3)} - \frac{8a^2b(b-a)}{(a+b+2)} -a^4 (a+b+1)-6a^2 ab}{\left(\frac{a^2b^2}{(a+b+1)}\right)}$$

$$u(4) = \frac{\frac{a(a+1)(a+b+2)(a+b)^3 - a(a+1)b(a+b)^3}{a+b+2} - \frac{a(a+1)(a+2)b(a+b)^3}{(a+b+2)(a+b+3)} - \frac{8a^2b(b-a)}{(a+b+2)} -a^4 (a+b+1)-6a^2 ab}{\left(\frac{a^2b^2}{(a+b+1)}\right)}$$

$$u(4) = \frac{a b (2 a^3 + 3 a^2 b - 3 a^2 + a b^2 + 3 a b + b^2)-\frac{ a(a+1)b(a+b)^3}{a+b+2} - \frac{a(a+1)(a+2)b(a+b)^3}{(a+b+2)(a+b+3)} - \frac{8a^2b(b-a)}{(a+b+2)} }{\left(\frac{a^2b^2}{(a+b+1)}\right)}$$

$$u(4) = \frac{2 a^3 + 3 a^2 b - 3 a^2 + a b^2 + 3 a b + b^2-\frac{ (a+1)(a+b)^3}{a+b+2} - \frac{(a+1)(a+2)(a+b)^3}{(a+b+2)(a+b+3)} - \frac{8a(b-a)}{(a+b+2)} }{\left(\frac{ab}{(a+b+1)}\right)}$$

$$u(4) = \frac{2 a^2 + a^4 - 2 a b + 3 a^2 b + 2 a^3 b + 2 b^2 + 3 a b^2 + a^2 b^2 - \frac{(a+1)(a+2)(a+b)^3}{(a+b+3)} }{\frac{ab(a+b+2)}{a+b+1}}$$

...

It's getting too complicated, and there might be miscalculation, please if you spot any errand, let me know

Now let's see the relation between beta function and order statistics. Let's assume that we have several i.i.d random variables from a continuous distribution with nontrival elements $X_1, X_2, ..., X_n$ and they are sorted in increasing order, by further assume that these random variables are distinct, (if some of them are same, we will replace them by drawing from the distribution again) we want to ask how likely is that $X_k = x$.

Then, we will have $X_i < x$ for $i < k$ and $X_i > x$ for $i >k$ by our assumption. So $$g_k(x) = \Pr[X_k \le x] = \sum_{i=k}^n {n \choose i}F_X(x)^i [1-F_X(x)]^{n-i}$$

So $$g_k(x) - g_{k+1}(x) = { n \choose k} F_X(x)^k [1-F_X(x)]^{n-k}$$

Note that when $k = 1$, we will have essentially have everything but one element not exceeding $x$, so $g_1(x) = 1 - (1-F_X(x))^{n}$.

And by taking the derivative, we will have:

$$f_1(x) = g_1'(x) = nf_X(x)(1-F_X(x))^{n-1}$$

and we also take the derivative with regard to x for $g_k(x) - g_{k+1}(x) = { n \choose k} F_X(x)^k [1-F_X(x)]^{n-k}$, we will get

$$f_{k+1}(x) = f_k(x) - {n\choose k}(kF_X(x)^{k-1} (1-F_X(x))^{n-k} -(n-k)F_X(x)^k (1-F_X(x))^{n-k-1})f(x)$$

$$f_{k+1}(x) = f_k(x) - \frac{n!}{k! (n-k)!}(kF_X(x)^{k-1} (1-F_X(x))^{n-k} -(n-k)F_X(x)^k (1-F_X(x))^{n-k-1})f(x)$$

$$f_{k+1}(x)= f_k(x) - \frac{n!}{(k-1)! (n-k)!}F_X(x)^{k-1} (1-F_X(x))^{n-k}f(x) +\frac{n!}{k! (n-k-1)!}F_X(x)^k (1-F_X(x))^{n-k-1}f(x)$$

We also notice that

$$f_{k}(x) = f_{k-1}(x)- \frac{n!}{(k-2)! (n-k+1)!}F_X(x)^{k-2} (1-F_X(x))^{n-k+1}f(x) +\frac{n!}{(k-1)! (n-k)!}F_X(x)^{k-1} (1-F_X(x))^{n-k}f(x)$$

...

So $f_{k+1}(x) = f_{1}(x) - n f_X(x) (1-F_X(x))^{n-1} +\frac{n!}{(k-1)! (n-k)!}F_X(x)^{k-1} (1-F_X(x))^{n-k}f(x)$

So when $k=1$, we will have

$$f_2(x) = nf_X(x)(1-F_X(x))^{n-1} - n((1-F_X(x))^{n-1} -(n-1)F_X(x) (1-F_X(x))^{n-2})f(x)$$

$$f_2(x) = n(n-1)F_X(x) (1-F_X(x))^{n-2}f(x)$$

...

So $$f_{k+1}(x) = \frac{n!}{(k-1)! (n-k)!}F_X(x)^{k-1} (1-F_X(x))^{n-k}f(x)$$

The derivation can be found here

A more intuitive memoization is that we have $k$ being fixed, so $f_X(x)$. and we have $k-1$ elements and $n-k$ elements being either below or above $x$, so $F_X(x)^{k-1}$ and $(1-F_X(x))^{n-k}$ and we also need to fix these $k-1$ elements and $n-k$ elements so we will have $\frac{n!}{(k-1)!(n-k)!}$.

We can easily see the probability density function is a beta function. When we take the underlying distribution for $F_X(x) = x$ as uniform distribution over $(0,1)$, specifically with parameters $a = i$ and $b = n-i+1$

Here are some computer simulations that allow us to verify the result above:

Number of Trials:

Size:

Position:

Less than or Equal To (any number between 0 and 1):