Section5.3The QR Factorization
Our goal in this section is to improve the efficiency of solving the least squares problems via the normal equations. Recall that we want to find \(\bx\) such that \(A \bx = \bb\) but where \(\bb\) is not in the column space of \(A\). This necessitated the use of the normal equations \(A^T A \bx = A^T \bb\) to project \(\bb\) onto the column space of \(A\). This would be FAR more efficient if the columns of \(A\) were orthogonal (perpendicular) and normalized (unit vectors). Hence, our goal will be to take \(A\) and factor it into \(A = QR\) where the columns of \(Q\) are orthonormal (orthogonal and normalized) and \(R\) is an upper triangular matrix.
First we'll do a problem by hand:
Consider the matrix \(A = \begin{pmatrix}1 \amp 1 \amp 0 \\ 1 \amp 0 \amp 1 \\ 0 \amp 1 \amp 1 \end{pmatrix}\). We want to factor \(A\) into \(A = QR\) where the columns of \(Q\) are orthonormal and \(R\) is upper triangular. Here is the algorithm (run every step by hand!). For notation purposes, \(\ba_j\) will be the \(j^{th}\) column of \(A\).
Define \(\bq_1 = \displaystyle \frac{\ba_1}{\|\ba_1\|}\). This will be the first column of \(Q\).
Define \(\bq_2 = \displaystyle \ba_2 - \left( \ba_2 \cdot \bq_1 \right) \bq_1\). Once you've done this calculation normalize your result so \(\displaystyle \bq_2 = \frac{\bq_2}{\|\bq_2\|}\). This is the second column of \(Q\). Explain the geometry of this step (DRAW A PICTURE!)
Define \(\bq_3 = \displaystyle \ba_3 - \left( \ba_3 \cdot \bq_1 \right)\bq_1 - \left( \ba_3 \cdot \bq_2 \right)\bq_2\) and then redefine \(\displaystyle \bq_3 = \frac{\bq_3}{\|\bq_3\|}\). This is now the third column of \(Q\).
The matrix \(R\) is formed as follows: \begin{equation*} R = \begin{pmatrix}\ba_1 \cdot \bq_1 \amp \ba_2 \cdot \bq_1 \amp \ba_3 \cdot \bq_1 \\ 0 \amp \ba_2 \cdot \bq_2 \amp \ba_3 \cdot \bq_2 \\ 0 \amp 0 \amp \ba_3 \cdot \bq_3 \end{pmatrix} \end{equation*}
Write down \(Q\) and observe that the process is just begging for several loops on a computer to implement this on bigger matrices.
Use MATLAB to check that \(A = QR\).
-
Finally let's look at the utility of the result:
Since \(Q\) is an orthonormal matrix \(Q^TQ\) is the identity matrix! (Explain why.)
If we want to solve \(A\bx = \bb\) and we can write \(A = QR\) then \begin{equation*} A \bx = \bb \implies QR \bx = \bb \implies R \bx = Q^T \bb \end{equation*}
Since \(R\) is upper triangular we can use the “Usolve” code we have from our work with the LU-factorization to solve for \(\bx\). THIS IS REALLY FAST!!
Compare this to the number of computer operations needed so solve the normal equations \(A^TA\bx = A^T \bb\) with an \(LU\)-solver.
Write MATLAB code that takes a matrix \(A\) (not necessarily square) and outputs both the \(Q\) matrix and the \(R\) matrix. Pseducode for this function is as follows:
Set up the function call: function [Q,R] = MyQR(A)
Set up zeros matrices for both \(Q\) and \(R\). Remember that \(A\) may not be square so assume that the columns are in \(\mathbb{R}^m\) and the rows are in \(\mathbb{R}^n\). Hence \(Q\) will be \(m \times n\) and \(R\) will be \(n \times n\).
-
Start a loop that counts across the columns: for j=1:n
define a temporary variable: \(\hat{\bq} = \ba_j\)
-
start a loop that will do all of the subtractions and build some of the \(R's\): for i=1:j-1
build one of the \(R's\): \(R_{ij} = \ba_j \cdot \bq_i\)
Do one of the subtractions: \(\hat{\bq} = \hat{\bq} - R_{ij} \bq_i\)
end the loop for i
normalize the \(j^{th}\) column in \(Q\): \(\bq_j = \hat{\bq} / \| \hat{\bq} \|\)
build the \(R's\) on the diagonal: \(R_{jj} = \ba_j \cdot \bq_j\)
end the loop for j
Now that you have \(QR\) and \(LU\) code we're going to use both of them! The problem is as follows:
We are going to find the polynomial of degree 4 that best fits the function \(y = \cos(4t) + 0.1 \varepsilon(t)\) at 50 equally spaced points \(t\) between \(0\) and \(1\). Here we are using \(\varepsilon(t)\) as a function that outputs normally distrubuted random white noise. In MATLAB you will build \(y\) as
y = cos(4*t) + 0.1*randn(size(t));
Build the \(t\) vector and the \(y\) vector (these are your data). We need to set up the least squares problems \(A \bx = \bb\) by setting up the matrix \(A\) as we did in the other least squares curve fitting problems and by setting up the \(\bb\) vector using the \(y\) data you just built.
Solve the normal equations \(A^T A \bx = A^T \bb\) using your \(LU\) code.
Solve the system \(A \bx = \bb\) by first transforming \(A\) to \(A = QR\) and then solving \(R\bx = Q^T \bb\).
Use MATLAB to find the sum of the square errors between the polynomial approximation and the function \(f(t) = \cos(4t)\) for both the \(QR\) and the \(LU\) approaches.
Build MATLAB code that does parts (a) - (c) several hundered times and complies results comparing which method gives the better approximation (smaller sum of square error).