Session 5: MATLAB backslash & some pitfalls

Date: 10/16/2017, Monday

In [1]:
format compact

Different behaviors of backslash

You’ve already used the backslash \ a lot to solve linear systems. But if you look at MATLAB backslash documentation you will find it can do much more than solving linear systems. This powerful feature can sometimes be very confusing if you are not careful.

Standard linear system

First see a standard problem: a 4x4, full rank square matrix \(A\) and a 4-element column vector \(b\). We want to solve

\[Ax = b\]
In [2]:
b = rand(4,1)
b =
    0.8147
    0.9058
    0.1270
    0.9134
In [3]:
A = rand(4,4)
A =
    0.6324    0.9575    0.9572    0.4218
    0.0975    0.9649    0.4854    0.9157
    0.2785    0.1576    0.8003    0.7922
    0.5469    0.9706    0.1419    0.9595

Well, not big deal…

In [4]:
x = A\b
x =
   -0.0486
    0.9204
   -0.0630
    0.0579

We can verify the result:

In [5]:
A*x - b % almost zero
ans =
   1.0e-15 *
         0
         0
   -0.0555
   -0.1110

Incorrect shape

You should already notice that \(b\) must be a column vector.

In [6]:
x = A\b' % row vector doesn't work
Error using \
Matrix dimensions must agree.

Also, \(A\) and \(b\) must have the same number of rows, because \(A\) and \(b\) come from the following linear system with \(n\) rows (equations) and \(m\) columns (variables):

\[\begin{split}\left\{ \begin{array}{} a_{11}x_1 + a_{12}x_2 + ... + a_{1m}x_m = b_1 \\ a_{21}x_1 + a_{22}x_2 + ... + a_{1m}x_m = b_2 \\ ... \\ a_{n1}x_1 + a_{n2}x_2 + ... + a_{1m}x_m = b_n \\ \end{array} \right.\end{split}\]
In [7]:
A = rand(3,4) % A has 3 rows but b has 4 rows
A =
    0.6557    0.9340    0.7431    0.1712
    0.0357    0.6787    0.3922    0.7060
    0.8491    0.7577    0.6555    0.0318
In [8]:
A\b % doesn't work
Error using \
Matrix dimensions must agree.

Looks like \(A\) has to be a square matrix, but see below…

Over-detemined linear system

In fact, A doesn’t have to be a square matrix, just like a linear system doesn’t need to have the same number of rows (equations) and columns (variables).

Here we create an over-determined system

\[\begin{split}\left\{ \begin{array}{} a_{11}x_1 + a_{12}x_2 + a_{13}x_3 = b_1 \\ a_{21}x_1 + a_{22}x_2 + a_{23}x_3 = b_2 \\ a_{31}x_1 + a_{32}x_2 + a_{33}x_3 = b_3 \\ a_{41}x_1 + a_{42}x_2 + a_{43}x_3 = b_4 \end{array} \right.\end{split}\]

An over-determined system means A is tall and narrow (more rows than columns)

In [9]:
A = rand(4,3) % more equations (rows) than variables (columns)
A =
    0.2769    0.6948    0.4387
    0.0462    0.3171    0.3816
    0.0971    0.9502    0.7655
    0.8235    0.0344    0.7952
In [10]:
x = A\b % backslash works
x =
    0.9798
    0.2901
    0.2142

MATLAB does return a set of \((x_1,x_2,x_3)\). But you know this can’t be the solution because you can’t fulfill 4 equations by just 3 degrees of freedom.

In [11]:
A*x - b % not zero, so x is not a solution
ans =
   -0.2479
   -0.6868
    0.4078
    0.0738

So what’s x? It is actually a least-square fit, same as the result from the normal equation

\[A^TAx=A^Tb\]
In [12]:
(A'*A)\(A'*b) % solve normal equation
ans =
    0.9798
    0.2901
    0.2142

In MATLAB, simply using A\b is actually more accurate than (A'*A)\(A'*b) for solving least squares, especially for large matrices. That’s because the condition number of \(A^TA\) could be very large.

In [13]:
cond(A)
cond(A'*A)
ans =
    8.7242
ans =
   76.1114

We find \(cond(A^TA) = cond(A)^2\). It is not quite large in this tiny case but could explode for large matrices

So how to avoid the normal equation?

Recall that, for a standard, full rank system \(Ax=b\), the code A\b doesn’t compute \(A^{-1}b\) at all, because \(A^{-1}\) is often ill-conditioned. It uses LU factorization which is better-conditioned.

Similarly, for an over-determined \(Ax=b\), the code A\b doesn’t compute \(A^TA\) at all. It uses QR factorization or similar techniques to make the problem better-conditioned.

Under-determined linear system

Now let’s look at an under-determined system

\[\begin{split}\left\{ \begin{array}{} a_{11}x_1 + a_{12}x_2 + a_{13}x_3 + a_{14}x_4 + a_{15}x_5 = b_1 \\ a_{21}x_1 + a_{22}x_2 + a_{23}x_3 + a_{24}x_4 + a_{25}x_5 = b_2 \\ a_{31}x_1 + a_{32}x_2 + a_{33}x_3 + a_{34}x_4 + a_{35}x_5 = b_3 \\ a_{41}x_1 + a_{42}x_2 + a_{43}x_3 + a_{44}x_4 + a_{45}x_5 = b_4 \end{array} \right.\end{split}\]

An under-determined system means A is short and wide (more columns than rows)

In [14]:
A = rand(4,5) % less equations (rows) than variables (columns)
A =
    0.1869    0.7094    0.6551    0.9597    0.7513
    0.4898    0.7547    0.1626    0.3404    0.2551
    0.4456    0.2760    0.1190    0.5853    0.5060
    0.6463    0.6797    0.4984    0.2238    0.6991
In [15]:
x = A\b % backslash still works
x =
   -0.1886
    1.4263
    0.2995
   -0.3730
         0

We can see x is the solution to \(Ax=b\):

In [16]:
A*x - b % almost zero
ans =
   1.0e-15 *
    0.4441
    0.3331
    0.0278
    0.1110

However, we know that an under-determined system has infinite number of solutions. But MATLAB just returns one value. What’s special about this value?

It turns out that the x we get here has the smallest norm \(||x||_2\) among all possible solutions.

In [17]:
norm(x) % smaller than any other possible solutions
ans =
    1.5162

Another multiple-behavior example

Because the backslash operator \ has different behaviors under different circumstances, you have to be very careful. Sometimes your matrix shape might be wrong, but MATLAB will still return a result. But in that case you might be solving a least square problem instead of a full-rank linear system!

MATLAB has quite a lot of multi-behavior (“poly-algorithm”) functions. Another example is the built-in lu() function for LU factorization.

In [18]:
A = rand(4,4) % to be LU-factorized
A =
    0.8909    0.1493    0.8143    0.1966
    0.9593    0.2575    0.2435    0.2511
    0.5472    0.8407    0.9293    0.6160
    0.1386    0.2543    0.3500    0.4733

Just calling lu() with no return, you get a single matrix with \(L\) and \(U\) stacked together.

In [19]:
lu(A)
ans =
    0.9593    0.2575    0.2435    0.2511
    0.5704    0.6938    0.7903    0.4728
    0.9287   -0.1295    0.6905    0.0246
    0.1445    0.3129    0.0978    0.2867

Using [L,U] to hold the return, you get two separate matrices. Notice that L is not a strict lower-triangular matrix, because it also incorporates the pivoting matrix \(P\)

In [20]:
[L,U] = lu(A)
L =
    0.9287   -0.1295    1.0000         0
    1.0000         0         0         0
    0.5704    1.0000         0         0
    0.1445    0.3129    0.0978    1.0000
U =
    0.9593    0.2575    0.2435    0.2511
         0    0.6938    0.7903    0.4728
         0         0    0.6905    0.0246
         0         0         0    0.2867

Using [L,U,P] to hold the return, you get all three matrices. Now L is strictly lower-triangular.

In [21]:
[L,U,P] = lu(A)
L =
    1.0000         0         0         0
    0.5704    1.0000         0         0
    0.9287   -0.1295    1.0000         0
    0.1445    0.3129    0.0978    1.0000
U =
    0.9593    0.2575    0.2435    0.2511
         0    0.6938    0.7903    0.4728
         0         0    0.6905    0.0246
         0         0         0    0.2867
P =
     0     1     0     0
     0     0     1     0
     1     0     0     0
     0     0     0     1

You can ignore the returning P by ~, but keep in mind that the returning L is different from that in cell [20], although the code looks very similar.

In [22]:
[L,U,~] = lu(A)
L =
    1.0000         0         0         0
    0.5704    1.0000         0         0
    0.9287   -0.1295    1.0000         0
    0.1445    0.3129    0.0978    1.0000
U =
    0.9593    0.2575    0.2435    0.2511
         0    0.6938    0.7903    0.4728
         0         0    0.6905    0.0246
         0         0         0    0.2867