Why the Central Limit Theorem (CLT) is not working?
If you ever take any statistics classes, regardless of the level, the chances are high that you would come across The Central Limit Theorem1 (CLT). It states that the mean of any long sequence of identically and independently (i.i.d.) distributed random variables, no matter what the distribution is, provided it has finite variance, is approximately normal distributed. The statement is powerful because it is distribution-free, and it plays such a central role in probability theory, for this reason that Polya (1920) named it Central Limit Theorem. The CLT is often applied to generate normal distribution approximation for further inference, for instance, the confidence interval. However, we need to be cautious about whether the theorem assumptions are satisfied every time we use it, otherwise, as will be shown in the post, the conclusion of normal approximation will not hold. The content is created based on the book Elements of Large Sample Theory.
1. Distribution variance needs to be finite
🌰 Counterexample: Cauchy distribution \(C(0,1)\)2
The mean3 and variance3 of Cauchy distribution are not defined, when variables \(X\)’s are i.i.d according to Cauchy distribution, the distribution of sample mean \(\bar{X}\) is the same as that of a single \(X_i\), that is, it is again a Cauchy distribution45. Hence, \(\bar{X}\) converges in distribution to Cauchy instead of being asymptotically normal.
2. Distribution needs to be independent of sample size \(n\)
🌰 Counterexample: Poisson distribution \(Poi(\lambda=\frac{1}{n})\)
Let \(X_1,…,X_n\) be i.i.d. Poisson distribution \(Poi(\lambda)\) with \(\lambda=\frac{1}{n}\), then mean \(\mu=\frac{1}{n}\) and variance \(\sigma^2=\frac{1}{n}\). Consequently \(\sum_{i=1}^nX_i\) is distributed as \(Poi(1)\). Write $$ \frac{\sqrt{n}(\bar{X}-\mu)}{\sigma} = \frac{\sum_{i=1}^nX_i-n\mu}{\sqrt{n}\sigma} = \sum_{i=1}^nX_i-1. $$ Therefore, \(\bar{X}\) has distribution of \(Y-1\), where \(Y\) is \(Poi(1)\), which is not asymptotically normal.
3. CLT + Delta Method
CLT is often used together with the Delta Method, which states that if \( \sqrt{n}(T_n-\theta) \rightarrow N(0, \tau^2) \), then \(\sqrt{n}(f(T_n)-f(\theta)) \rightarrow N(0, \tau^2 f’(\theta)^2) \), provided that \(f’(\theta)\) exists and is not zero. Still, the assumptions need to be satisfied, otherwise, the limiting distribution of \(f(T_n)\) will not be normal.
🌰 Counterexample: \(f(\theta)=|\theta|\)
Suppose \(T_n\) is a sequence of statistics satisfying \( \sqrt{n}(T_n-\theta) \rightarrow N(0, \tau^2) \) and that we are interested in the limiting behavior of \(|T_n|\). Since \(f(\theta)=|\theta|\) is differentiable with derivative \(f’(\theta)=\pm 1\) of all \(\theta \neq 0\). Then by the Delta Method, we have $$ \sqrt{n}(|T_n|-|\theta|) \rightarrow N(0, \tau^2) \quad \theta \neq 0. $$
When \(\theta=0\), the Delta Method does not apply, but we can determine the limiting behavior of \(|T_n|\) by \(|T_n|-|\theta|=|T_n|\) and $$ P[\sqrt{n}|T_n|<a] = P[-a < \sqrt{n}T_n < a] = \Phi(\frac{a}{\tau}) - \Phi(-\frac{a}{\tau}) = P(\tau\chi_1<a) $$ where \(\Phi(\cdot)\) is the cumulative density function of normal distribution, and \(\chi_1=\sqrt{\chi_1^2}\) is the distribution of the absolute value of a standard normal variable.
-
Classic CLT: Let \(X_i, i=1,2,…\) be identically and independent distributed with mean \(E(X_i)=\mu\) and variance \(Var(X_i)=\sigma^2<\infty\). Then \(\frac{\sqrt{n}(\bar{X}-\mu)}{\sigma}\rightarrow N(0,1)\). ↩︎