On the behaviour of high-dimensional normal distributions

It is quite intuitive that random normal vectors center around the origin. However, this does not hold in higher dimensions. The concentration properties of higher dimensional normal distributions deviate heavily from our understanding of normal distributions in low dimensions. In this post we investigate, where in $\mathcal{R}^n$ is a high-dimensional random normal vector likely to be located?

Low-dimensional Normal Distributions

The normal distribution is one of the most widely used probability distributions in statistics and machine learning. Our experience from handling gaussian distributions in low dimensions tells us that it generally follows a bell curve and there is a high probability that a value drawn from a 1-dimensional $\mathcal{N}(0, 1)$ distribution with mean zero and variance one will be close to the origin.

We can check this empirically. Random draws from $\mathcal{N}(0,1)$ have a high chance of being close to zero as is evident in the histogram in Figure 1.

Code

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns   

x = np.random.randn(20000)
ax = sns.distplot(x)
ax.set(xlabel='x', ylabel='count')
plt.show()

Values of 1-dimensional normal random variable cluster around origin

Now, let’s check whether this observation holds in the case of 2-dimensions.

Notation: $x$, a “n-dimensional normal random vector” is a random variable in $\mathcal{R}^{n}$ whose coordinates $x_i$ are independent random variables sampled from $\mathcal{N}(0,1)$. The distribution of $x$ is given by $\mathcal{N}(0, I_n)$ where $I_n$ is $n \times n$ identity matrix.

Code

x = np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), size=30000)
ax = sns.jointplot(x=x[:,0], y=x[:,1], kind="kde", cbar=True)
plt.show()

Values of 2-dimensional normal random variable concentrate around origin

It is clear that values are tightly concentrated around the origin in low dimensional cases.

High-dimensional Normal Distributions

As we have seen till now, the random normal vectors in low-dimensional cases have a high probability of being close to the origin. However, it can be observed empirically that this property is not true when we move towards higher dimensions.

First, let’s discuss the Euclidean norm of a vector $x \in \mathcal{R}^n$. The distance of a vector $x$ from the origin in Euclidean space is given by

$$ ||x||_2 = \sqrt{x^{2}_1 + x^{2}_2 + \dots + x^{2}_n} $$

which is the Euclidean norm of $x$.

Let’s check the Euclidean norm of a random variable in $\mathcal{R}^{100}$.

Code

x = np.random.multivariate_normal(mean=np.zeros(100), cov=np.eye(100), size=30000)
lengths = [np.sqrt(np.sum(i**2)) for i in x]
plt.hist(lenths, bins=50)
plt.xlabel("Lengths")
plt.ylabel("Count")
plt.show()

Lengths of 30,000 100-dimensional normal random variable centered around 10.

Quite unintuitively, the majority of vectors have length around $10 = \sqrt{100}$. Why is that? This is quite unexpected and goes against our intuition of random normal vectors being centered around zero.

Well, as it turns out, there is a very elegant theorem to explain why high-dimensional normal vectors are not centered around zero.


Theorem: Let $Z$ be a random vector in $\mathcal{R}^n$ with independent $\mathcal{N}(0,1)$ coordinates. Then $$P(| ||Z||_2 - \sqrt{n} | \leq t) \geq 2 \exp(-ct^2),$$ where $c>0$ is a constant, $t\geq0$ and $|| . ||_2$ is the Euclidean vector norm.


Essentially, this theorem says that the probability of getting a random normal vector far away from the sphere of radius $\sqrt{n}$ decays exponentially fast, ie, a high-dimensional random vector is very likely to be spread around a sphere of radius $\sqrt{n}$. This theorem has a beautiful proof in 1.

To understand a bit more intuitively why this happens, let’s look at the Euclidean norm again. $$ ||x||_2 = \sqrt{(0 + \epsilon_1)^{2} + (0 + \epsilon_2)^{2} + \dots + (0 + \epsilon_n)^{2}} $$ Here, $\epsilon_i$ is the deviation of $x_i$ from zero. Each entry in $x$ will on average be zero because it is $\mathcal{N}(0,1)$. The deviations from zero are not significant in low dimensions, However, as we start increasing dimensions, the non-significant deviations start adding up which results in a large Euclidean norm and thus $x$ is far away from origin. Another way to think about this is that in a high dimensional space, there is more room for each component of $x$ to move.

Takeaways:

  • A high-dimensional random normal vector is tightly concentrated around a sphere of radius $\sqrt{n}$.

  • $||x||_2 \sim \sqrt{n}$

The following figure provides a high level intuition of what we mean by vectors situated at sphere of radius $\sqrt{n}$ taken from the amazing book1 by Roman Vershynin.

Gaussian point cloud in 2D (left) and its intuitive high dimensional visualisation.

With this knowledge, it seems like a very logical and intuitive conclusion to think that high-dimensional random normal vectors are not centered around zero.

Another observation is that the vectors are pushed away from the origin in a circle. To understand why this happens, we need to look at the properties of orthogonal matrix and random normal vectors.

Consider random vector $x \sim \mathcal{N}(0,I_n)$ and an orthogonal matrix $U$. Then, $$Ux \sim \mathcal{N}(0,I_n)$$

This property is known as Rotation Invariance which means $Ux$ has the same distribution as $x$. We can check this empirically as well.

from scipy.stats import ortho_group
np.set_printoptions(suppress=True)

x = np.random.multivariate_normal(mean=np.zeros(100), cov=np.eye(100), size=100)
U = ortho_group.rvs(100)
Ux = U.dot(x)

print("Mean:", np.mean(Ux))
print("Variance:", np.std(Ux))

Output

Mean: 0.0008
Variance: 0.9763

As we can observe from the above code $Ux \sim \mathcal{N}(0, I_n)$.

In order to show why random normal vectors concentrate on a sphere of radius around $\sqrt{n}$, represent $x \sim \mathcal{N}(0,I_n)$ in polar form as $$ x = r\theta $$ where $r = ||x||_2$ is length of vectors and $\theta = x/||x||_2$ is direction of vectors.

We have already seen that $r = ||x||_2 \approx \sqrt{n}$ with high probability. $$ x \approx \sqrt{n} \theta $$

Now, $||\theta||_2 = 1$ implies that values of $\theta$ lie on sphere $\mathcal{S}^{n-1}$.

Due to rotational invariance property as shown above, $\theta$ is rotationally invariant since $\theta$ is a random variable too. This is another way of saying that the directions are distributed uniformly in all directions on $\mathcal{S}^{n-1}$. This distribution is the uniform distribution on $\mathcal{S}^{n-1}$

More formally, $$ x \approx \sqrt{n} \theta \sim Unif(\sqrt{n}\mathcal{S}^{n-1}) $$

All of this leads us to the elegant observation that gaussian distributions in high dimensions are close to the uniform distribution on the sphere of radius $\sqrt{n}$.

$$\mathcal{N}(0, I_n) \approx Unif(\sqrt{n}\mathcal{S}^{n-1})$$

This result has some interesting implications:

  1. Our minds have a hard time grasping concepts in high dimensions and our intuitions developed from low dimensional spaces may fail in higher dimensions.
  2. Basic statistics like mean, mode, etc. may give unexpected results in higher dimensional distributions.
Avatar
Anmol Goel
Undergraduate Researcher

Exploring NLProc Research one step at a time