Decompose correlation matrix with block structure

Thu, Apr 1, 2021 9-minute read algebra

 

I recently read an interesting paper1 about how to sample from multivariate normal distribution with less computational complexity, and I really like the part where they make use of the block structure of the correlation matrix to conduct the eigen-decomposition. In the post, I will explain the ideas and present the techniques applied in the paper.

1. Background

Sampling from multivariate normal distribution is very common in simulation studies. Suppose that \(X=(X_1, X_2,…, X_n)\) is a random vector following a multivariate normal distribution \(N(0, \Sigma)\), where \(\Sigma\) is the covariance matrix of size \(n\times n\). The standard sampling method applies Cholesky factorization on the covariance matrix \(\Sigma\) to get \(\Sigma = LL^T\), where \(L\) is a lower triangular matrix. Then make the sampling by \(\tilde{X}=LZ\), where \(Z\) is a \(n\times 1\) vector of independent standard normal random variables.

Correlation matrix with block structure often occurs when the random variables belong to non-overlapping groups and the correlations are solely determined by the group labels. For instance, in finance, a portfolio usually consists of obligators which can be divided into different groups by the industry, region etc; In gene co-expression data, genes can be categorized to different functional groups, where genes in the same group usually are expressed concurrently. In both of the above examples, the correlations between variables is an important component of understanding the overall performance of the variables, and often times it is assumed to depend only on the group level information.

2. 🎈 Key Idea 🎈

When random vector \(X\) is of size \(n\), make one sample from the multivariate normal distribution using the Cholesky factorization would take \(O(n^2)\)2 time. The new sampling method proposed in the paper reduces the time to \(O(K^2)+O(n)\), given that correlation matrix is positive semi-definite and has a block structure, where \(K\) is the number of non-overlapping groups. The strategy is to first eigen-decompose \(\Sigma\) by \(\Sigma=UDU^T\), where \(U\) is the orthogonal3 matrix composed of eigenvectors of \(\Sigma\) and \(D\) is a diagonal matrix with corresponding eigenvalues along the diagonal, then make the sample by \(\tilde{X}=U\sqrt{D}Z\). Here comes the key idea — with the block structure, eigenvectors of \(\Sigma\) can be created from the basis of a set of predefined subspaces spanning \(R^n\). The eigenvalues of those eigenvectors are easily calculated and the eigenvectors form a simple structure in \(U\) which reduces the computational cost of \(U\sqrt{D}Z\). In the next sections, we break down the idea into detailed steps.

3. Notation

Let \(X=(X_1, X_2,…, X_n)\) be a vector of Gaussian random variables, without loss of generality, assume that each \(X_i\) of \(X\) has zero mean and unit variance, so that the covariance matrix is equal to the correlation matrix, i.e \(\Sigma=\text{Cor}\). Suppose we can divide the random variables into \(K\) non-overlapping groups, and write \(X=(X^{(1)}, X^{(2)},…, X^{(K)} )\), where \(X^{(k)}=(X^{(k)}_1,…,X^{(k)}_{N_k})\) is the vector of random variables belong to group \(k\), \(1 \leq k \leq K\). Consider the case where the correlation between two non-identical variables only depends on the groups they belong to. That is, correlation \(\text{Cor}(X^{(k)}_i,X^{(l)}_j )=\rho_{kl}\). For two identical variables, the correlation is always \(1\). Therefore, the correlation matrix has a block structure as presented in (1).

$$\tag{1} \Sigma = \left[ \def\arraystretch{1.5} \begin{array}{cccc:ccc:cc:ccc} 1 & \rho_{11} & \cdots & \rho_{11} & \rho_{12} &\cdots & \rho_{12} & \cdots & \cdots & \rho_{1K} & \cdots & \rho_{1K} \\ \rho_{11} & 1 & \cdots & \rho_{11} & \rho_{12} &\cdots & \rho_{12} & \cdots & \cdots & \rho_{1K} & \cdots & \rho_{1K} \\ \vdots & \vdots & \vdots & \vdots & \vdots &\vdots & \vdots & \vdots & \vdots& \vdots& \vdots & \vdots \\ \rho_{11} & \rho_{11} & \cdots &1 & \rho_{12} &\cdots & \rho_{12} & \cdots & \cdots & \rho_{1K} & \cdots & \rho_{1K} \\ \hdashline \rho_{21} & \rho_{21} & \cdots & \rho_{21} & 1 & \cdots & \rho_{22} & \cdots & \cdots & \rho_{2K} & \cdots & \rho_{2K} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots &\vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \rho_{21} & \rho_{21} & \cdots & \rho_{21} & \rho_{22} & \cdots &1 & \cdots & \cdots & \rho_{2K} & \cdots & \rho_{2K} \\ \hdashline \vdots & \vdots & \vdots & \vdots & \vdots & \vdots &\vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \hdashline \rho_{K1} & \rho_{K1} & \cdots & \rho_{K1} & \rho_{K2} & \cdots & \rho_{K2} & \cdots & \cdots & 1 & \cdots & \rho_{KK} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots &\vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \rho_{K1} & \rho_{K1} & \cdots & \rho_{K1} & \rho_{K2} & \cdots &\rho_{K2} & \cdots & \cdots & \rho_{KK} & \cdots & 1 \end{array} \right] $$

4. Define subspaces \(\lbrace V_0, V_1,…, V_K\rbrace\) that span \(R^n\)

For \(k=1,…,K\), let \(V_k\) be the subspace of \(R^n\) such that its elements have the below form: $$ \hat{x} = ( \underbrace{0…0}_{N_1}, \cdots, \underbrace{0…0}_{N_{k-1}}, \underbrace{x_1,…,x_{N_k}}_{N_{k}} , \underbrace{0…0}_{N_{k+1}} , \cdots, \underbrace{0…0}_{N_K})^T \quad \text{where}\quad x_1+x_2+\cdots + x_{N_k} = 0 $$

Let \(V_0\) be the subspace of \(R^n\) such that its elements have the following form: $$ \tag{2} \hat{x} = ( \underbrace{x_1,…,x_1}_{N_1}, \underbrace{x_2,…,x_2}_{N_2}, \cdots,\underbrace{x_K,…,x_K}_{N_K})^T $$

Let \(x=(x_1,…,x_K)\), define a mapping \(P(x)\) from \(R^K\) to \(R^n\): \(P(x) = \frac{\hat{x}}{||\hat{x}||_2}\), where \(\hat{x}\) is in form (2).

📌 It can be proved that all the subspaces \(V_k, k=0,1,…,K\) are perpendicular to each other and they span \(R^n\). For each \(k\neq 0\), any element \(v\) of \(V_k\) is an eigenvector of \(\Sigma\) with eigenvalue \(1-\rho_{kk}\), i.e. \(\Sigma v = (1-\rho_{kk})v\).

5.1 Find orthonormal basis of subspaces \(V_k, k =1,…, K\)

Define matrix \(F_m\) of size \(m\times (m-1)\), \(m>1\). $$ F_{m}=\begin{bmatrix} \frac{1}{\sqrt{1\cdot2}} & \frac{1}{\sqrt{2\cdot3}} & \frac{1}{\sqrt{3\cdot4}} & \cdots & \frac{1}{\sqrt{(m-1)\cdot m}} \\ -\frac{1}{\sqrt{1\cdot2}} & \frac{1}{\sqrt{2\cdot3}} & \frac{1}{\sqrt{3\cdot4}} & \cdots & \frac{1}{\sqrt{(m-1)\cdot m}} \\ 0 & -\frac{2}{\sqrt{2\cdot3}} & \frac{1}{\sqrt{3\cdot4}} & \cdots & \frac{1}{\sqrt{(m-1)\cdot m}} \\ 0 & 0 & -\frac{3}{\sqrt{3\cdot4}} & \cdots & \frac{1}{\sqrt{(m-1)\cdot m}} \\ 0 & 0 & 0 & \cdots & \frac{1}{\sqrt{(m-1)\cdot m}} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & - \frac{m-1}{\sqrt{(m-1)\cdot m}} \end{bmatrix}_{m\cdot(m-1)} $$

For each \(k\neq 0\), define matrix \(U_k\),

$$ U_k = \begin{bmatrix} \mathbf{0}_{N_1 \times (N_k - 1)} \\ \mathbf{0}_{N_2 \times (N_k - 1)} \\ \vdots \\ \mathbf{0}_{N_{k-1} \times (N_k - 1)} \\ F_{N_k} \\ \mathbf{0}_{N_{k+1} \times (N_k - 1)} \\ \vdots \\ \mathbf{0}_{N_K \times (N_k - 1)} \end{bmatrix} $$

where \(\mathbf{0}_{N_l \times (N_k - 1)}\) is a zero matrix of dimension \(N_l \times (N_k - 1)\).

📌 It’s easy to check that \(F_m^TF_m = I_{(m-1)\cdot(m-1)}\), where \(I\) is identity matrix, and the columns of \(U_k\) are orthonormal, hence the columns of \(U_k\) form a basis of subspace \(V_k\).

5.2 Find orthonormal basis of subspace \(V_0\)

We are actually looking for eigenvectors \(w \in V_0\) such that \(\Sigma w = \lambda w\). Recall the form of \(w\) in (2) and the block structure of \(\Sigma\) in (1), it’s straightforward to check that \(\Sigma w = \lambda w\) if and only if \(B\cdot(x_1, x_2,…, x_K) = \lambda\cdot(x_1, x_2,…, x_K) \), where \(B\) is defined below.

$$ B = \begin{bmatrix} 1 + (N_1 - 1)\rho_{11} & N_2 \rho_{12} & N_3\rho_{13} & \cdots & N_K\rho_{1K} \\ N_1 \rho_{21} & 1+(N_2-1) \rho_{22} & N_3\rho_{23} & \cdots & N_K\rho_{2K} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ N_1\rho_{K1} & N_2\rho_{K2} &N_3\rho_{K3} & \cdots & 1+(N_K-1)\rho_{KK} \end{bmatrix} $$

📌 Therefore, \(x\in R^K\) is an eigenvector of \(B\) w.r.t eigenvalue \(\lambda\) if and only if \(P(x)\in R^n\) is an eigenvector of \(\Sigma\) w.r.t \(\lambda\). The problem has been reduced from \(N-\)dimensional space to \(K-\)dimensional space. Once we have derived the eigenvectors of \(B\), we can apply the mapping function \(P(\cdot)\) to get eigenvectors of \(\Sigma\). If the eigenvectors of \(\Sigma\) are not orthogonal, one can use the Gram-Schmidt process to orthogonalize them.

6. Decompose \(\Sigma\)

Now we can eigen-decompose \(\Sigma\) by \(\Sigma=UDU^T\), where \(U\) is an orthogonal matrix and \(D\) is a diagonal matrix defined below. \(G\) is a \(N\times K\) matrix, each column of which equals \(P(x)\) with \(x\) being an eigenvector of \(B\). $$ U =\begin{bmatrix} G & U_1 & U_2 & U_3 & \cdots & U_K \end{bmatrix} = \begin{bmatrix} & F_{N_1} & \mathbf{0} & \mathbf{0} & \cdots & \mathbf{0} \\ & \mathbf{0} & F_{N_2} & \mathbf{0} & \cdots & \mathbf{0} \\ G & \mathbf{0} & \mathbf{0} & F_{N_3} & \cdots & \mathbf{0} \\ & \vdots & \vdots & \vdots & \vdots & \vdots \\ & \mathbf{0} & \mathbf{0} & \mathbf{0} & \cdots & F_{N_K} \end{bmatrix} $$

$$ D = \text{diag}\lbrace \underbrace{\lambda_1, …, \lambda_K}_{\text{eigenvalues of B}}, \underbrace{1-\rho_{11},…,1-\rho_{11}}_{N_1-1},\cdots, \underbrace{1-\rho_{KK},…,1-\rho_{KK}}_{N_K-1} \rbrace $$

📌 The only calculation required is finding the eigen-pairs of \(B\), which takes \(O(K^3)\) steps, much less than the \(O(n^3)\) steps needed to decompose \(\Sigma\) directly.

7. Sample \(X\)

Once one has the eigen-decomposition of \(\Sigma\) as in section 6, a sample of \(X\) can be calculated by \(X=U\sqrt{D}Z\).

The calculation takes \(O(K^2)+O(n)\) steps. Since (1) \(D\) is a diagonal matrix and \(Z\) is a vector, so \(\sqrt{D}Z\) takes \(O(n)\) steps; (2) \(G\) has same rows for each group, so the multiplication of \(G\) with the first \(K\) elements of \(\sqrt{D}Z\) can be carried out by \(O(K^2)\) steps; (3) Let \(e_k\) be the subvector of \(\sqrt{D}Z\) corresponding to the submatrix \(F_{N_k}\), since any adjacent rows of \(F_{N_k}\) are different no more than two elements, once one has calculated the \(i-\)th row of \(F_{N_k}e_k\), it just takes fixed amount of steps to get the \((i+1)-\)th row of \(F_{N_k}e_k\), so \(O(N_k)\) steps are carried out to compute \(F_{N_k}e_k\). In summary, the total number of steps required is about \(O(K^2)+O(n)\).

8. Close Remark

It’s important to keep in mind that all the above calculations being feasible is because of the two assumptions: (1) \(\Sigma\) is positive semi-definite; (2) \(\Sigma\) has block structure. Sometimes one would first specify the block structure of a correlation matrix based on the data or experience, however, it’s possible that such correlation matrix is not positive semi-definite. In regard to this issue, the paper also mentions how to ‘fix’ the correlation matrix by setting the negative eigenvalues to zero. Interested reader can refer to the paper1.

   


  1. Correlation matrix with block structure and efficient sampling methods ↩︎ ↩︎

  2. Note that Cholesky factorization \(\Sigma=LL^T\) takes \(O(n^3)\) time, while sampling using \(LZ\) takes \(O(n^2)\) time. ↩︎

  3. Orthogonal matrix is a square matrix with orthonormal columns and orthonormal rows. ↩︎