Egwald Mathematics - Linear Algebra: Systems of Linear Equations

Egwald Web Services
Domain Names
Web Site Design

Egwald Mathematics: Linear Algebra

Systems of Linear Equations

by

Elmer G. Wiens

Egwald's popular web pages are provided without cost to users.
Follow Elmer Wiens on Twitter:

Systems of Linear Equations.

A system of linear equations can be expressed as:

a_1,1 x₁ + a_1,2 x₂ + . . . + a_1,n x_n = b₁

a_2,1 x₁ + a_2,2 x₂ + . . . + a_2,n x_n = b₂

. . . . . . . . . . .

a_m,1 x₁ + a_m,2 x₂ + . . . + a_m,n x_n = b_m

Example.

6 * x₁	+	2 * x₂	+	2 * x₃	=	6
4 * x₁	+	-3 * x₂	+	5 * x₃	=	-4
1 * x₁	+	6 * x₂	+	4 * x₃	=	3

Matrix Representation.

The same problem expressed in matrix and vector form is:

A *x = b

where A = [a_{i, j}] is the matrix of coefficients, x^T = (x₁, x₂, . . . . x_n) is the vector of unknowns, and b^T = (b₁, b₁, . . . . b_m) is the vector of RHS (right hand side) coefficients.

If all the coefficients of the vector b are zero, the system A * x = b is called a homogeneous system.

Otherwise, the system is called nonhomogeneous.

Example: Nonhomogeneous System.

A =

6	2	2
4	-3	5
1	6	4

x =

x₁

x₂

x₃

b =

-4

The columns (and rows) of the matrix A are linearly independent, as confirmed by the following diagram, where I use the standard (x, y, z) coordinate system.

Please note that I distinguish between the axis x, and the vector x^T = (x₁, x₂, x₃). Also, note that while the vector x^T is a row vector, and the vector x is a column vector, the diagrams below do not distinguish between column and row vectors.

Graphically, the matrix A, its column vectors, and the column vector b can be represented

Column Space of a Matrix A

Since the columns of A span R³, one can express any three dimensional vector, such as b in the example, as a linear combination of the columns of A. The solution to the linear equation problem, the vector

x^T = (x₁, x₂, x₃) = ( 1, 1, -1)

provides the coefficients of the linear combination of the columns of A that yield b.

6	2	2
4	-3	5
1	6	4

-1

-4

Example: Homogeneous System.

Since the columns of A span R³, with the RHS vector b^T = (0, 0, 0) the only solution is x^T = (0, 0, 0), called the zero solution.

Any homogeneous system, (ie a homogeneous system with arbitrary matrix A and b^T = o^T the zero vector), always has at least one solution, the zero or trivial solution where x^T = o^T.

However, if the columns of A are not linearly independent, its homogeneous system has nontrivial solutions.

General Solution.

If x_p is a particular solution of the nonhomogeneous system, and x_h is a solution of the homogeneous system, then:

A * (x_p + x_h) = b + o = b,

proving that (x_p + x_h) is also a solution of the nonhomogeneous system.

The Image of a Matrix.

Let A be an m by n matrix. Write A = {a₁, a₂, ... a_n} as the set of m by 1 column vectors of A, ie A = [a₁, a₂, ... a_n].

The image of A, im(A), is the span of the columns of A. Formally:

im(A) = span(A) = {y in R^m | y = A * x, for some x in Rⁿ.

Sometimes this is referred to as the column space of the matrix A, denoted by R(A).

The Kernel of a Matrix A.

The kernel of A, ker(A), is the set of all vectors x in Rⁿ for which A * x = o, the zero vector in R^m. Formally:

ker(A) = {x in Rⁿ | A * x = 0 in R^m}

Sometimes this is referred to as the null space of the matrix A, denoted by N(A).

The matrix A, and its transpose matrix A^T (represented by A' in the diagrams below) generate four important subspaces: im(A), ker(A), im(A^T), and ker(A^T).

Example: General Solution with Linearly Dependent Columns (and Rows).

The matrix of the next linear equation problem has linearly dependent columns (and rows).

8	2	3
2	4	6
1	2	3

x₁

x₂

x₃

-6

-3

The following diagram displays this linear equation problem. The image of the matrix A, Im(A), is the two dimensional yellow subspace of R³ generated by the columns of A.

Image of the Matrix A in Yellow

Because the vector b^T = (4, -6, -3) lies within the Im(A), it can be expressed as a linear combination of the columns of A. The particular solution to the linear equation problems is x_p^T = (1, -2, 0), since:

x_p

8	2	3
2	4	6
1	2	3

-2

-6

-3

The ker(A) is the line (also a subspace) through the vector x_h^T = (0, -6, 4), since x_h is a solution of the homogeneous system for the matrix A:

x_h

8	2	3
2	4	6
1	2	3

-6

Now let us look at A^T, the transpose matrix of the matrix A.

The image of the matrix A^T, Im(A^T), is the two dimensional yellow subspace of R³ generated by the rows of A (the columns of A^T).

Image of the Matrix A transpose in Yellow

As suggested by the diagram, ker(A) — the line through the vector x_h — is perpendicular to the Im(A^T) subspace.

Similarly, the kernel of the matrix A^T, ker(A^T) is the line through the vector y_h^T = (0, -3, 6), since y_h is a solution of the homogeneous system for the matrix A^T:

A^T

y_h

8	2	1
2	4	2
3	6	3

-3

The subspace ker(A^T) is perpendicular to the subspace Im(A), as one can see two diagrams above.

Example: General Solution.

The general solution to the linear equation problem A * x = b is x = x_p + s * x_h for any scalar s. If x is defined by:

x_p

+ s *

x_h

x₁

x₂

x₃

-2

+ s *

-6

then

A * x = A * x_p + s * A * x_h = b + s * o = b.

Example: Inconsistent System.

Consider the system with the same matrix A as in the example above, but with a RHS vector b that lies outside of the Im(A).

8	2	3
2	4	6
1	2	3

x₁

x₂

x₃

-6

The following diagram displays this linear equation problem. The image of the matrix A, Im(A), is the same two dimensional yellow subspace of R³ generated by the columns of A.

Inconsistent System A*x = b

However, the vector b^T = (4, -6, 3) lies outside the Im(A) subspace. Therefore, it cannot be expressed as a linear combination of the columns of A. Thus, this linear equation problem has no particular solution, although its homogeneous system has solutions consisting of each vector on the line through the vector x_h^T = (0, -6, 4).

Solutions: Inconsistent System.

In the diagram above, I projected the vector b^T = (4, -6, 3) onto the Im(A) subspace, yielding the vector p^T = (4, -3.6, -1.8). Recall that the point of the vector p is the closest point on Im(A) to the point of the vector b. Consequently, the vector (b - p) is perpendicular to the vector p. To verify this, note that the dot product p^T * (b - p) = (4, -3.6, -1.8)^T * (0, -2.4, 4.8) = (0, 0, 0)^T.

A case can be made that the vector p^T = (4, -3.6, -1.8) provides a "solution" to the inconsistent system A * x = b, if one solves the system A * x = p, instead.

In least-squares problems, with more equations than unknowns, the solution provided by the algorithms is the vector x such that A * x = p. That is A * x equals the projection of the vector b onto Im(A).

Writing the matrix A = [a₁, a₂, a₃], one can see that a₃ = 1.5 * a₂, ie a₂ and a₃ are linearly dependent. Dropping the vector a₂ yields the least-squares problem B * z = b given by:

8	3
2	6
1	3

z₁

z₂

-6

The least-squares solution is z^T = (0.8286, -0.8762) since

8	3
2	6
1	3

0.82857

-0.87619

-3.6

-1.8

A least-squares problem always has a solution for B * z = p, since p is in the Im(B). Furthermore, if the columns of B are linearly independent, the solution vector z is unique.

To get back to the inconsistent system, write w^T = (0.82857, 0, -0.87619). Then, w is a solution of A * x = p since

8	2	3
2	4	6
1	2	3

0.82857

-0.87619

-3.6

-1.8

However, the vector x = w + s * x_h, where s is a scalar and x_h is a solution of A * x = o, given by:

+ s *

x_h

x₁

x₂

x₃

0.8286

-0.8762

+ s *

-6

is also a solution to A * x = p, because A * (w + s * x_h) = A * w + s * A * x_h = p + s * o = p.

If small changes in the coefficients of the matrix A of a least-squares problem produce a matrix Â with linearly dependent columns, the columns of A are "almost" linearly dependent. The estimates of the coefficients of the solution vector are tentative, since small changes in the data matrix A result in a system with a matrix Â having many solutions.

Solution: Inconsistent System - Minimum Length.

One might choose as "optimal" the solution vector x^T = (x₁, x₂, x₃)^T = w + s * x_h with minimum length. The length (norm) of x = norm(w + s * x_h) is:

norm(x) = sqrt(x²₁ + x²₂ + x²₃) = sqrt[841/1225+36*s^2+(-92/105+4*s)^2],

having a minimum for s⁺ = 0.0674. The solution of minimum length to the inconsistent system is:

x⁺ = w + s⁺ * x_h = (0.8286, -0.4044, -0.6066).

The singular value decomposition algorithm provides x⁺ = A⁺ * b = (0.8286, -0.4044, -0.6066), where A⁺ is the pseudoinverse of A.

Echelon Algorithm.

The Echelon Algorithm is similar to the Gauss-Jordan method and the Gaussian method. The Echelon Algorithm permits one to calculate the general solution of a system of linear equations having more columns than rows. Begin by writing the matrix A and the vector b side by side as shown in Tableau⁰ below. This matrix arrangement is called the augmented matrix.

Using the following row operations on the tableaux of the augmented matrix, convert the left hand matrix to echelon form.

1. exchange two rows to get the largest entry (in absolute value) onto the diagonal.

2. divide a row by the diagonal entry.

3. add a multiple of one row to another row.

Begin in the upper left hand corner of the left hand matrix. Proceed along the diagonal converting each column into the appropriate column of the identity matrix as shown in the sequence of tableaux. Exchange rows with the lower row that has the largest entry (in absolute value) for the column. If all entries on and below the diagonal of the column are zeros, proceed to the next column.

Tableau⁰

R⁰₁

R⁰₂

R⁰₃

6	12	1	-5
4	8	10	6
1	2	3	2

Tableau¹

R¹₁ = R⁰₁ / 6

R¹₂ = R⁰₂ - (4) * R¹₁

R¹₃ = R⁰₃ - (1) * R¹₁

1	2	0.1667	-0.8333
0	0	9.3333	9.3333
0	0	2.8333	2.8333

1.1667

9.3333

2.8333

Tableau²

R²₁ = R¹₁ - (0.1667) * R²₂

R²₂ = R¹₂ / 9.3333

R²₃ = R¹₃ - (2.8333) * R²₂

1	2	0	-1
0	0	1	1
0	0	0	0

The last tableau contains the information one needs to obtain the general solution to the original linear equation problem A * x = b.

Echelon Tableau

1	2	0	-1
0	0	1	1
0	0	0	0

x₁

x₂

x₃

x₄

The solution to the Echelon Tableau yields the particular solution, x_p, and the entire set of solutions of the associated homogeneous system, ie all vectors in the Ker(A).

Vectors in the Ker(A) are scalar multiples of the basis vectors, x_h1, x_h2, of the Ker(A) subspace.

x_p

+ s1 *

x_h1

+ s2 *

x_h2

x₁

x₂

x₃

x₄

+ s1 *

-2

+ s2 *

-1

Use the form below to solve some other linear equation problem.

Specify the dimensions of A (m <= n, 2 <= m,n) and fill in / change the numbers in the upper left hand corner, and then click "Solve".

Linear Equations Echelon Algorithm.

If you add a constant to an entry of the b vector, the system might be inconsistent. If you add a constant to an entry of a basis vector of Ker(A), you can create a matrix with more linearly independent columns.

Singular Value Decomposition Algorithm.

Consider the same linear equation problem A*x = b solved with the Echelon Algorithm.

6	12	1	-5
4	8	10	6
1	2	3	2

x₁

x₂

x₃

x₄

Using singular value decomposition yields the orthogonal matrices U and V, and the diagonal matrix S of singular values of A:

A * x = U * S * V^T * x = b,

V^T

-0.658	0.7523	0.0342
-0.7268	-0.6225	-0.2903
-0.1971	-0.2158	0.9563

17.9819	0	0	0
0	10.8005	0	0
0	0	0	0

-0.3922	-0.7844	-0.4737	-0.0815
0.1674	0.3348	-0.5667	-0.734
-0.9045	0.402	0.1005	-0.1005
0	-0.3333	0.6667	-0.6667

x₁

x₂

x₃

x₄

Since the matrix S has 2 positive singular values, the rank of the matrix is 2. Furthermore, the columns of U and V provide orthonormal bases for the following subspaces:

Subspace	Orthonormal Bases
Im(A)	first 2 columns of U
Ker(A^T)	last 1 columns of U
Im(A^T)	first 2 columns of V
Ker(A)	last 2 columns of V

The Pseudoinverse is A⁺ = V * S⁺ * U^T

A⁺

S⁺

U^T

0.026	0.0062	0.001
0.052	0.0124	0.0019
-0.0221	0.0518	0.0165
-0.0481	0.0456	0.0156

-0.3922	0.1674	-0.9045	0
-0.7844	0.3348	0.402	-0.3333
-0.4737	-0.5667	0.1005	0.6667
-0.0815	-0.734	-0.1005	-0.6667

0.0556	0	0
0	0.0926	0
0	0	0
0	0	0

-0.658	-0.7268	-0.1971
0.7523	-0.6225	-0.2158
0.0342	-0.2903	0.9563

Both the columns and rows of A are linearly dependent. Setting x⁺ = A⁺ * b yields p = A * x⁺, the projection of b onto the Im(A) subspace.

x⁺ = A⁺ * b

x⁺

A⁺

* b

x₁

x₂

x₃

x₄

0.026	0.0062	0.001
0.052	0.0124	0.0019
-0.0221	0.0518	0.0165
-0.0481	0.0456	0.0156

0.2727

0.5455

0.6364

0.3636

p = A * x⁺

* x⁺

p₁

p₂

p₃

6	12	1	-5
4	8	10	6
1	2	3	2

0.2727

0.5455

0.6364

0.3636

The RHS vector, b, lies inside the Im(A). Consequently, the system A * x = b is consistent. Therefore, p = b.

Overdetermined Systems
With # Rows m >= # Columns n

Suppose one wants to derive a linear equation model of the form A * x = b from a data set with a design (data) matrix A having more rows than columns. The columns of A and the column vector b represent measurable factors of the phenomena of interest. Each row of A represents one observation of the measured factors of the phenomena.

Since the matrix A has more rows than columns, one expects that A * x = b will be inconsistent, ie that b will lie outside the Im(A). Consequently, b will not be represented exactly as a linear combination of the columns of A.

I discuss least squares methods, also called multiple regression methods, on the Statistical web pages. Here I derive methods that statisticians and econometricians use to evaluate the least square (multiple regression) estimates for overdetermined systems of linear equations.

Least Squares (Multiple Regression) Solution.

With an inconsistent system, the least squares solution obtains a vector x⁺ from the data such that the vector p given by:

p = A * x⁺

is the projection of the vector b onto Im(A).

To be the projection of the vector b onto Im(A), p must be the vector in Im(A) that minimizes the distance between p and b, ie:

minimize sqrt{ (b₁ - p₁)² + (b₂ - p₂)² + . . . + (b_n - p_n)² }

giving the name "least squares" to this solution.

As the number of observations increases, one hopes that ones understanding of the phenomena improves. This could be the case if the least squares estimated values of each parameter of the vector x⁺ converge as the number of observations increases.

Furthermore, one might deduce that a model of the form:

z = a₁* x⁺ ₁ + a₂* x⁺ ₂ + . . + a_n*x⁺ _n,

for a given instance of factors represented by the vector

a^T = (a₁, a₂, . . . a_n)

could be used to predict the measured value of the factor represented in the b column, if the a vector was observed.

Normal Equations.

The normal equations provide a direct method for obtaining the least squares parameters:

x⁺^T = (x⁺₁, x⁺₂, . . . x⁺_n)

to the linear equations system A * x = b.

To reduce the dimensions of the system, multiply the equation A * x = b by the matrix A^T, the transpose of A. The resulting equations are called the normal equations:

A^T * A * x = A^T * b.

Example.

As an experimenter, I want to summarize the data set with an equation in R³ space with standard coordinates (x, y, z):

z = x⁺₁ * x + x⁺₂ * y

described by the linear equations system A * x = b. The matrix A has two columns representing the independent variables (factors) of my model, x and y. The RHS vector b represents the dependent variable, z. I want to estimate the coefficients x⁺₁ and x⁺₂ using least squares. Perhaps, I want to use the equation to predict values of the dependent variable z for as yet unknown values of the independent variables x and y.

2.6	4
-5.9	-3.5
-1.6	-4.6
-7.7	4.1
4	-7.3
2.2	-1.2
-4.2	-4.1
1.3	5.1

x₁

x₂

5.1
-5.2
-0.8
-2.3
-1.35
0.7
-5.85
3.1

The normal equations system A^T * A * x = A^T * b is:

A^T

2.6	-5.9	-1.6	-7.7	4	2.2	-4.2	1.3
4	-3.5	-4.6	4.1	-7.3	-1.2	-4.1	5.1

2.6	4
-5.9	-3.5
-1.6	-4.6
-7.7	4.1
4	-7.3
2.2	-1.2
-4.2	-4.1
1.3	5.1

x₁

x₂

2.6	-5.9	-1.6	-7.7	4	2.2	-4.2	1.3
4	-3.5	-4.6	4.1	-7.3	-1.2	-4.1	5.1

5.1
-5.2
-0.8
-2.3
-1.35
0.7
-5.85
3.1

Multiplying by the matrix A^T yields the reduced system with 2 equations and 2 unknowns:

(A^T*A)

(A^T*b)

143.59	-1.15
-1.15	163.77

x₁

x₂

87.67

81.66

The matrix (A^T*A) is symmetric. If the data matrix A has linearly independent columns, (A^T*A) has an inverse. The normal equations provide a system with a square coefficient matrix, solvable by standard methods.

Proceeding with the inverse of (A^T*A):

x⁺

(A^T*A)^-1

(A^T*b)

x⁺₁

x⁺₂

0.007	0
0	0.0061

87.67

81.66

0.6146

0.5029

The least squares solution using the normal equations is x⁺ = ( 0.6146, 0.5029 ).

The projection of b onto Im(A) is p = A * x⁺. Comparing p and b and computing b - p:

a₁

a₂

(b-p)

1
2
3
4
5
6
7
8

2.6
-5.9
-1.6
-7.7
4
2.2
-4.2
1.3

4
-3.5
-4.6
4.1
-7.3
-1.2
-4.1
5.1

5.1
-5.2
-0.8
-2.3
-1.35
0.7
-5.85
3.1

3.6097
-5.3864
-3.2969
-2.6702
-1.2131
0.7486
-4.6433
3.364

1.4903
0.1864
2.4969
0.3702
-0.1369
-0.0486
-1.2067
-0.264

The distance from Im(A) to b — the length of the vector (b-p) — is 3.19, found by computing the square root of the sums of the squares of the (b-p) column. (Im(A) is a subspace of R⁸, while b and p are vectors in R⁸)

For each row i of the data matrix A = [a₁, a₂], the triples (a_i,1, a_i,2, b_i) and (a_i,1, a_i,2, p_i) are points in R³ with standard coordinates (x, y, z). Furthermore, the least squares solution can be represented by the plane:

z = 0.6146 * x + 0.5029 * y

The plane is displayed below. If the point (a_i,1, a_i,2, b_i) lies above the plane, a blue line joins it to the point (a_i,1, a_i,2, p_i) on the plane. If (a_i,1, a_i,2, b_i) lies below the plane, a red line joins it to (a_i,1, a_i,2, p_i).

Least Squares System A*x = b

By clicking the refresh button on your browser, you can generate a new system of linear equations, with the new solution with least squares parameters x⁺ = (x⁺₁, x⁺₂) displayed above.

The functional form,

z = x⁺₁ * x + x⁺₂ * y,

ensures that the solution plane passes through the origin.

In the example of the next section, the estimated solution plane is not restricted to pass through the origin.

Singular Value Decomposition and Least Squares.

The "normal equations" least squares method of solving systems of linear equations computes the matrix (A^T * A). For systems with many observations, the data matrix A has many rows. Multiplying two matrices using a computer incurs the risk of roundoff errors distorting the components of the matrix (A^T * A) from their true values. Furthermore, the matrix (A^T * A) may be ill-conditioned if the columns of A are nearly linearly dependent. Experimenters often include as many factors as possible in their design matrix A, hoping to improve the "explanatory power" of their model. The probability that subtle inter dependencies among the columns of A will render A nearly linearly dependent increases with the number of included factors.

The singular value decomposition method controls for such errors. Firstly, it controls roundoff errors by decomposing the data matrix A using Householder and Jacobi transformations, known for their stability. This decomposition is obtained without computing (A^T * A). Secondly, singular value decomposition controls for linear dependence (and roundoff error) by setting very small singular values to zero.

Example.

The matrix A has two columns representing the independent variables (factors) of my model, x and y. The RHS vector b represents the dependent variable, z. As an experimenter, I want to summarize the data set with an equation in R³ space with standard coordinates (x, y, z):

z = x⁺₁ + x⁺₂ * x + x⁺₃ * y

described by the linear equations system A * x = b. I need to estimate three coefficients x⁺₁, x⁺₂, and x⁺₃ since I do not require the solution plane to pass through the origin. To obtain a z-intercept coefficient given by x⁺₁, I append a column of ones to the front of the matrix A

1	-3	-2.6
1	7.9	-4.6
1	-3.5	1.9
1	3.3	-5.4
1	-6.8	4.9
1	-6	-1.7
1	-6.6	-5.2
1	4.5	-0.8

x₁

x₂

x₃

-4
3.05
2.7
2.15
-1.55
-1.55
-4.3
4.25

Singular value decomposition yields the orthogonal matrices U and V, and the diagonal matrix S of singular values of A:

A * x = U * S * V^T * x = b,

V^T

0.1302	-0.3449	-0.1664
-0.5571	-0.2098	-0.3775
0.2462	0.0567	-0.4478
-0.2985	-0.4189	-0.1798
0.5002	0.2373	-0.542
0.3261	-0.3498	-0.1314
0.2948	-0.6913	0.1186
-0.2819	0.0419	-0.5214

15.9827	0	0
0	10.16	0
0	0	2.2796

0.0225	-0.9517	0.3061
-0.1652	0.2985	0.94
-0.986	-0.0718	-0.1505

x₁

x₂

x₃

-4
3.05
2.7
2.15
-1.55
-1.55
-4.3
4.25

Since the matrix S has 3 positive singular values, the rank of the matrix A is 3. Computing A⁺ = V * S⁺ * U^T, the least squares solution to A * x = b is:

x⁺ = A⁺ * b.

x⁺

A⁺

* b

x₁

x₂

x₃

0.0778	0.1659	0.1931	0.0842	0.2313	0.063	-0.0396	0.2244
-0.0127	0.0389	0.0011	0.0111	-0.0057	-0.0256	-0.0416	0.0344
-0.0184	-0.0052	0.0395	-0.0326	0.0673	-0.0174	-0.0661	0.0329

-4
3.05
2.7
2.15
-1.55
-1.55
-4.3
4.25

1.5655

0.5698

0.4416

The least squares solution using singular value decomposition is x⁺ = ( 1.5655, 0.5698, 0.4416 ).

The projection of b onto Im(A) is p = A * x⁺. Comparing p and b and computing b - p:

a₁

a₂

a₃

(b-p)

1
2
3
4
5
6
7
8

1
1
1
1
1
1
1
1

-3
7.9
-3.5
3.3
-6.8
-6
-6.6
4.5

-2.6
-4.6
1.9
-5.4
4.9
-1.7
-5.2
-0.8

-4
3.05
2.7
2.15
-1.55
-1.55
-4.3
4.25

-1.2922
4.036
0.41
1.0614
-0.1458
-2.6043
-4.4917
3.7765

-2.7078
-0.986
2.29
1.0886
-1.4042
1.0543
0.1917
0.4735

The distance from Im(A) to b — the length of the vector (b-p) — is 4.252, found by computing the square root of the sums of the squares of the (b-p) column. (Im(A) is a subspace of R⁸, while b and p are vectors in R⁸)

For each row i of the data matrix A = [a₁, a₂, a₃], the triples (a_i,2, a_i,3, b_i) and (a_i,2, a_i,3, p_i) are points in R³ with standard coordinates (x, y, z). Furthermore, the least squares solution can be represented by the plane:

z = 1.5655 + 0.5698 * x + 0.4416 * y

The plane is displayed below. If the point (a_i,2, a_i,3, b_i) lies above the plane, a blue line joins it to the point (a_i,2, a_i,3, p_i) on the plane. If (a_i,2, a_i,3, b_i) lies below the plane, a red line joins it to (a_i,2, a_i,3, p_i).

Least Squares System A*x = b

By clicking the refresh button on your browser, you can generate a new system of linear equations, with the new solution with least squares parameters x⁺ = (x⁺₁, x⁺₂, x⁺₃) displayed above.

A Square Matrix.

If A is a square matrix of dimension n, the system of linear equations problem has of n equations in n unknowns. If the rank of A, rank(A), equals n, its columns and rows are linearly independent and the problem has a unique solution.

The variables x = [x_i] that solve the system of linear equations A * x = b can be obtained by various methods.

You can adjust the matrix A and the RHS vector b in the form below to see how the different methods solve the problem. The available methods are based on the techniques for decomposing a matrix that I discussed on the matrices web page. These methods are the A inverse solution using the Gauss-Jordan method, P*A = L*U solution using the Gaussian method, A = Q*R solution using Householder transformations, and the svd solution using singular value decomposition.

The Gauss-Jordan Inverse Method

The Gausss-Jordan inverse method uses row operations on the initial tableau [ A | I ] to convert the RHS matrix to the identity matrix I and the LHS matrix to the inverse matrix A^-1.

Solve A * x = b with A^-1.

A^-1

* b

x₁

x₂

x₃

x₄

x₅

2	-3	1	0	-0
-3	6	-4	1	0
1	-4	6	-4	1
0	1	-4	6	-2.8
-0	0	1	-2.8	1.64

310

325

356

405

475

Solution vector x^T
1	1	1	1	1

The Gaussian Decomposition Method

The Gaussian decomposition method factors the matrix A into the product of a lower triangular matrix L and an upper triangular matrix U, keeping track of row switches to construct a permutation matrix P so that P * A = L * U.

Solve A * x = b using P * A * x = L * U * x = P * b.

1	0	0	0	0
0.7333	1	0	0	0
0.7467	0.65	1	0	0
0.7867	0.35	0.5385	1	0
0.8667	0.125	0.1923	0.3571	1

75	80	90	105	125
0	-2.6667	-7	-12	-16.6667
0	0	-0.65	-1.6	-2.5
0	0	0	-0.5385	-1.1538
0	0	0	0	-0.3571

x₁

x₂

x₃

x₄

x₅

0	0	0	0	1
1	0	0	0	0
0	1	0	0	0
0	0	1	0	0
0	0	0	1	0

310

325

356

405

475

The solution vector x is obtained in two steps. First, solve for y in L * y = P * b by forward substitution:

P * b

1	0	0	0	0
0.7333	1	0	0	0
0.7467	0.65	1	0	0
0.7867	0.35	0.5385	1	0
0.8667	0.125	0.1923	0.3571	1

y₁

y₂

y₃

y₄

y₅

475

310

325

356

405

Solution vector y^T
475	-38.33333	-4.75	-1.69231	-0.35714

Next, solve for x in U * x =y by backward substitution:

75	80	90	105	125
0	-2.6667	-7	-12	-16.6667
0	0	-0.65	-1.6	-2.5
0	0	0	-0.5385	-1.1538
0	0	0	0	-0.3571

x₁

x₂

x₃

x₄

x₅

475

-38.3333

-4.75

-1.6923

-0.3571

Solution vector x^T
1	1	1	1	1

The Householder QR Method

The Householder decomposition method factors the matrix A into the product of an orthonormal matrix Q and an upper triangular matrix R, such that A = Q * R.

Solve A * x = b using A * x = Q * R * x = b.

-0.3939	-0.73	0.5585	-0	0
-0.4011	-0.3366	-0.7228	-0.4509	-0
-0.4226	0.0152	-0.2782	0.8106	0.2945
-0.4656	0.3046	0.0697	0.0751	-0.8246
-0.5372	0.5107	0.2886	-0.3661	0.483

-139.6138	-146.626	-161.0443	-183.6639	-215.7022
0	2.4143	6.513	11.2295	15.5124
0	0	0.5585	1.5112	2.4155
0	0	0	-0.4509	-0.993
0	-0	0	0	0.2945

x₁

x₂

x₃

x₄

x₅

310

325

356

405

475

Multiplying by Q^T gives R * x = Q^T * b.

Q^T * b

-139.6138	-146.626	-161.0443	-183.6639	-215.7022
0	2.4143	6.513	11.2295	15.5124
0	0	0.5585	1.5112	2.4155
0	0	0	-0.4509	-0.993
0	-0	0	0	0.2945

x₁

x₂

x₃

x₄

x₅

-846.6501

35.6692

4.4853

-1.444

0.2945

These equations are solved for x by back substitution yielding:

Solution vector x^T
1	1	1	1	1

The Singular Value Decomposition Method

The singular value decomposition method factors the matrix A into the product A = U * S* V^T, where U consists of the eigenvectors of A * A^T, V consists of the eigenvectors of A^T * A, and S consists of the square roots of the eigenvalues of A * A^T (which equal the eigenvalues of A^T * A).

Solve A * x = b using A * x = U * S * V^T * x = b.

V^T

-0.3631	0.642	-0.5234	0.3778	-0.1982
-0.3816	0.4404	0.1746	-0.6017	0.5176
-0.4196	0.1006	0.6398	-0.0213	-0.6356
-0.4791	-0.2708	0.2518	0.6143	0.5063
-0.5629	-0.5572	-0.472	-0.3427	-0.1801

384.0674	0	0	0	0
0	10.1448	0	0	0
0	0	0.5623	0	0
0	0	0	0.1488	0
0	0	0	0	0.0767

-0.3631	-0.3816	-0.4196	-0.4791	-0.5629
0.642	0.4404	0.1006	-0.2708	-0.5572
-0.5234	0.1746	0.6398	0.2518	-0.472
0.3778	-0.6017	-0.0213	0.6143	-0.3427
-0.1982	0.5176	-0.6356	0.5063	-0.1801

x₁

x₂

x₃

x₄

x₅

310

325

356

405

475

Since the matrix S has 5 positive singular values, the rank of the matrix A is 5. Computing A⁺ = V * S⁺ * U^T, the solution to A * x = b is:

A⁺

* b

x₁

x₂

x₃

x₄

x₅

2	-3	1	0	-0
-3	6	-4	1	0
1	-4	6	-4	1
-0	1	-4	6	-2.8
-0	0	1	-2.8	1.64

310

325

356

405

475

Solution vector x^T
1	1	1	1	1

References.

Ayres, Frank Jr. Matrices. New York: Schaum McGraw-Hill, 1962.
Ayres, Frank Jr. Modern Algebra. New York: Schaum McGraw-Hill 1965.
Bretscher, Otto. Linear Algebra with Applications. Upper Saddle River: Prentice Hall, 1997.
Burden, Richard L. and J. Douglas Faires. Numerical Analysis. 6th ed. Pacific Grove: Brooks/Cole, 1997.
Cohn, P. M. Linear Equations. London: Routledge, 1964.
Demmel, James W. Applied Numerical Linear Algebra. Philadelphia: Siam, 1997.
Dowling, Edward T. Mathematics for Economists. New York: Schaum McGraw-Hill, 1980.
Lipschutz, Seymour. Linear Algebra. New York: Schaum McGraw-Hill, 1968.
Mathews, John H. and Kurtis D. Fink. Numerical Methods Using MATLAB. 3rd ed. Upper Saddle River: Prentice Hall, 1999.
Press, William H., Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling. Numerical Recipes: The Art of Scientific Computing. Cambridge: Cambridge UP, 1989.
Strang, Gilbert. Linear Algebra and Its Applications. 3d ed. San Diego: Harcourt, 1976.
Varah, James. Numerical Linear Algebra: Computer Science 402 Lecture Notes. Vancouver: University of B.C., 2000.
Watkins, David S. Fundamentals of Matrix Computations. New York: John Wiley, 1991.