Egwald Statistics: Multiple Regression
Linear and Restricted Multiple Regression
by
Elmer G. Wiens
Egwald's popular web pages are provided without cost to users.
Follow Elmer Wiens on Twitter:
The Regression Model
Let's use some of the notation from the popular statistics textbook:
Weisberg, Sanford. Applied Linear Regression, Second Edition. New York: Wiley, 1985.
Let X be the mxn matrix of predictors, Y the mx1 vector of responses, and ß the nx1 vector of unknown parameters(coefficients). I think there is a linear relation of the form:
Y = X * ß
If I want an intercept term, the first column of X must all be 1's. Also, I want m > n and X to have full column rank(the columns are linearly independent).
Householder's Algorithm:
Decompose the matrix X into:
X = Q*U
where Q is a mxm orthogonal matrix and U is a mxn matrix with:
the first n rows of U = V, a nxn upper triangular matrix(non-singular); the last m-n rows of V = W, a (m-n)xn matrix whose entries are all 0's.
Rewrite the regression problem as:
Y = Q*U*ß
or
U*ß = Q'*Y = b, a mx1 vector.
where the superscript ' means matrix transposition.
Note that Q orthogonal means that Q*Q' = I, a mxm identity matrix.
Let b1 = the first n components of b; b2 = the last m-n components of b.
Taking advantage of the structure of U, write:
V*ß = b1 (*)
The linear problem (*) is equivalent to the original problem. Moreover:
- I can solve for ß by back substitution.
- The regression problem's residual = ||b2||, the norm of b2.
So I have the main part of the problem. The statistical terms can be calculated as:
- Residual sum of squares: RSS = ||b2||^2
- Residual mean square(MSRSS): s2 = RSS/(m-n)
- Estimated variance of ß: var(ß) = s2 * (V'*V)^(-1) where ^(-1) denotes the inverse of the matrix. Note: var(ß) = VC, a nxn symmetric matrix - the variance-covariance matrix of ß
- Standard error of parameter ßi; stderror(i) = sqrt(VC(i,i))
- The t-value of parameter ßi: t-val = ßi / stderror(i)
- Sum of squares of Y: YY = Y'*Y
- Sum of Y: sumY = sum(Y) = sum(Y1,Y2,...,Ym)
- Total corrected sum of squares: SYY = YY - sumY*sumY/m
- Regression sum of squares: SSreg = SYY - RSS
- Coefficient of determination: R2 = SSreg/SYY
- Adjusted coefficient of determination: R2b = 1 - (1 -R2)*(m-1)/(m-n)
- Regression mean square: MSSreg = SSreg/(m-1)
- F-test for regression: compare F = MSSreg / MSRSS with F(n,m-n) distribution
Analysis of Variance
|
Source
|
Sum of Squares
|
Degrees of Freedom
|
Mean Square
|
F(n,m-n)
|
Regression
|
SSreg
|
n-1
|
MSSreg
|
F
|
Residual
|
RSS
|
m-n
|
MSRSS
|
|
Total
|
SYY
|
m-1
|
|
|
Notes:
1. Do not form the matrix X'*X and then compute its inverse. It takes too much computer time and could be numerically unstable.
2. Do not compute (V'*V)^(-1). Solve for each column of the matrix iV, the inverse matrix of V, by back substitution:
V * iV = I
where I is the nxn identity matrix. Compute iV*iV' if necessary (it is symmetric), or only the diagonal if you just need the standard errors of the parameters.
Restricted least squares:
Having obtained the unrestricted estimate of ß,
with linear restrictions given by:
r = R * ß,
compute the restricted estimate rß as:
Let S = (V'*V)^(-1) * R' * [R * (V'*V)^(-1) * R']^(-1)
then rß = ß + S * [r - R*ß]
- The residual = ||Y - X * rß||
- Residual sum of squares: RSS = ||Y - X * rß||^2
- Residual mean square(MSRSS): s2 = RSS/(m-n)
- Estimated variance of rß: var(rß)= s2 * (I - S * R) * (V'*V)^(-1)
Return to the statistics page
|