Session 5: MATLAB backslash & some pitfalls¶

Date: 10/16/2017, Monday

In [1]:

format compact

Different behaviors of backslash¶

You’ve already used the backslash \ a lot to solve linear systems. But if you look at MATLAB backslash documentation you will find it can do much more than solving linear systems. This powerful feature can sometimes be very confusing if you are not careful.

Standard linear system¶

First see a standard problem: a 4x4, full rank square matrix \(A\) and a 4-element column vector \(b\). We want to solve

\[Ax = b\]

In [2]:

b = rand(4,1)

In [3]:

A = rand(4,4)

A =
    0.6324    0.9575    0.9572    0.4218
    0.0975    0.9649    0.4854    0.9157
    0.2785    0.1576    0.8003    0.7922
    0.5469    0.9706    0.1419    0.9595

Well, not big deal…

In [4]:

x = A\b

We can verify the result:

In [5]:

A*x - b % almost zero

Incorrect shape¶

You should already notice that \(b\) must be a column vector.

In [6]:

x = A\b' % row vector doesn't work

Error using \
Matrix dimensions must agree.

Also, \(A\) and \(b\) must have the same number of rows, because \(A\) and \(b\) come from the following linear system with \(n\) rows (equations) and \(m\) columns (variables):

\[\begin{split}\left\{ \begin{array}{} a_{11}x_1 + a_{12}x_2 + ... + a_{1m}x_m = b_1 \\ a_{21}x_1 + a_{22}x_2 + ... + a_{1m}x_m = b_2 \\ ... \\ a_{n1}x_1 + a_{n2}x_2 + ... + a_{1m}x_m = b_n \\ \end{array} \right.\end{split}\]

In [7]:

A = rand(3,4) % A has 3 rows but b has 4 rows

A =
    0.6557    0.9340    0.7431    0.1712
    0.0357    0.6787    0.3922    0.7060
    0.8491    0.7577    0.6555    0.0318

In [8]:

A\b % doesn't work

Error using \
Matrix dimensions must agree.

Looks like \(A\) has to be a square matrix, but see below…

Over-detemined linear system¶

In fact, A doesn’t have to be a square matrix, just like a linear system doesn’t need to have the same number of rows (equations) and columns (variables).

Here we create an over-determined system

\[\begin{split}\left\{ \begin{array}{} a_{11}x_1 + a_{12}x_2 + a_{13}x_3 = b_1 \\ a_{21}x_1 + a_{22}x_2 + a_{23}x_3 = b_2 \\ a_{31}x_1 + a_{32}x_2 + a_{33}x_3 = b_3 \\ a_{41}x_1 + a_{42}x_2 + a_{43}x_3 = b_4 \end{array} \right.\end{split}\]

An over-determined system means A is tall and narrow (more rows than columns)

In [9]:

A = rand(4,3) % more equations (rows) than variables (columns)

A =
    0.2769    0.6948    0.4387
    0.0462    0.3171    0.3816
    0.0971    0.9502    0.7655
    0.8235    0.0344    0.7952

In [10]:

x = A\b % backslash works

MATLAB does return a set of \((x_1,x_2,x_3)\). But you know this can’t be the solution because you can’t fulfill 4 equations by just 3 degrees of freedom.

In [11]:

A*x - b % not zero, so x is not a solution

ans =
   -0.2479
   -0.6868
    0.4078
    0.0738

So what’s x? It is actually a least-square fit, same as the result from the normal equation

\[A^TAx=A^Tb\]

In [12]:

(A'*A)\(A'*b) % solve normal equation

In MATLAB, simply using A\b is actually more accurate than (A'*A)\(A'*b) for solving least squares, especially for large matrices. That’s because the condition number of \(A^TA\) could be very large.

In [13]:

cond(A)
cond(A'*A)

ans =
    8.7242
ans =
   76.1114

We find \(cond(A^TA) = cond(A)^2\). It is not quite large in this tiny case but could explode for large matrices

So how to avoid the normal equation?

Recall that, for a standard, full rank system \(Ax=b\), the code A\b doesn’t compute \(A^{-1}b\) at all, because \(A^{-1}\) is often ill-conditioned. It uses LU factorization which is better-conditioned.

Similarly, for an over-determined \(Ax=b\), the code A\b doesn’t compute \(A^TA\) at all. It uses QR factorization or similar techniques to make the problem better-conditioned.

Under-determined linear system¶

Now let’s look at an under-determined system

\[\begin{split}\left\{ \begin{array}{} a_{11}x_1 + a_{12}x_2 + a_{13}x_3 + a_{14}x_4 + a_{15}x_5 = b_1 \\ a_{21}x_1 + a_{22}x_2 + a_{23}x_3 + a_{24}x_4 + a_{25}x_5 = b_2 \\ a_{31}x_1 + a_{32}x_2 + a_{33}x_3 + a_{34}x_4 + a_{35}x_5 = b_3 \\ a_{41}x_1 + a_{42}x_2 + a_{43}x_3 + a_{44}x_4 + a_{45}x_5 = b_4 \end{array} \right.\end{split}\]

An under-determined system means A is short and wide (more columns than rows)

In [14]:

A = rand(4,5) % less equations (rows) than variables (columns)

A =
    0.1869    0.7094    0.6551    0.9597    0.7513
    0.4898    0.7547    0.1626    0.3404    0.2551
    0.4456    0.2760    0.1190    0.5853    0.5060
    0.6463    0.6797    0.4984    0.2238    0.6991

In [15]:

x = A\b % backslash still works

We can see x is the solution to \(Ax=b\):

In [16]:

A*x - b % almost zero

However, we know that an under-determined system has infinite number of solutions. But MATLAB just returns one value. What’s special about this value?

It turns out that the x we get here has the smallest norm \(||x||_2\) among all possible solutions.

In [17]:

norm(x) % smaller than any other possible solutions

ans =
    1.5162

Another multiple-behavior example¶

Because the backslash operator \ has different behaviors under different circumstances, you have to be very careful. Sometimes your matrix shape might be wrong, but MATLAB will still return a result. But in that case you might be solving a least square problem instead of a full-rank linear system!

MATLAB has quite a lot of multi-behavior (“poly-algorithm”) functions. Another example is the built-in lu() function for LU factorization.

In [18]:

A = rand(4,4) % to be LU-factorized

A =
    0.8909    0.1493    0.8143    0.1966
    0.9593    0.2575    0.2435    0.2511
    0.5472    0.8407    0.9293    0.6160
    0.1386    0.2543    0.3500    0.4733

Just calling lu() with no return, you get a single matrix with \(L\) and \(U\) stacked together.

In [19]:

lu(A)

ans =
    0.9593    0.2575    0.2435    0.2511
    0.5704    0.6938    0.7903    0.4728
    0.9287   -0.1295    0.6905    0.0246
    0.1445    0.3129    0.0978    0.2867

Using [L,U] to hold the return, you get two separate matrices. Notice that L is not a strict lower-triangular matrix, because it also incorporates the pivoting matrix \(P\)

In [20]:

[L,U] = lu(A)

L =
    0.9287   -0.1295    1.0000         0
    1.0000         0         0         0
    0.5704    1.0000         0         0
    0.1445    0.3129    0.0978    1.0000
U =
    0.9593    0.2575    0.2435    0.2511
         0    0.6938    0.7903    0.4728
         0         0    0.6905    0.0246
         0         0         0    0.2867

Using [L,U,P] to hold the return, you get all three matrices. Now L is strictly lower-triangular.

In [21]:

[L,U,P] = lu(A)

L =
    1.0000         0         0         0
    0.5704    1.0000         0         0
    0.9287   -0.1295    1.0000         0
    0.1445    0.3129    0.0978    1.0000
U =
    0.9593    0.2575    0.2435    0.2511
         0    0.6938    0.7903    0.4728
         0         0    0.6905    0.0246
         0         0         0    0.2867
P =
     0     1     0     0
     0     0     1     0
     1     0     0     0
     0     0     0     1

You can ignore the returning P by ~, but keep in mind that the returning L is different from that in cell [20], although the code looks very similar.

In [22]:

[L,U,~] = lu(A)

L =
    1.0000         0         0         0
    0.5704    1.0000         0         0
    0.9287   -0.1295    1.0000         0
    0.1445    0.3129    0.0978    1.0000
U =
    0.9593    0.2575    0.2435    0.2511
         0    0.6938    0.7903    0.4728
         0         0    0.6905    0.0246
         0         0         0    0.2867