Egwald Statistics: Multiple Regression
Linear and Restricted Multiple Regression
by
Elmer G. Wiens
Egwald's popular web pages are provided without cost to users. Please show your support by joining Egwald Web Services as a Facebook Fan:
Follow Elmer Wiens on Twitter:
The Regression Model
Let's use some of the notation from the popular statistics textbook:
Weisberg, Sanford. Applied Linear Regression, Second Edition. New York: Wiley, 1985.
Let X be the mxn matrix of predictors, Y the mx1 vector of responses, and ß the nx1 vector of unknown parameters(coefficients). I think there is a linear relation of the form:
Y = X * ß
If I want an intercept term, the first column of X must all be 1's. Also, I want m > n and X to have full column rank(the columns are linearly independent).
Householder's Algorithm:
Decompose the matrix X into:
X = Q*U
where Q is a mxm orthogonal matrix and U is a mxn matrix with:
the first n rows of U = V, a nxn upper triangular matrix(non-singular); the last m-n rows of V = W, a (m-n)xn matrix whose entries are all 0's.
Rewrite the regression problem as:
Y = Q*U*ß
or
U*ß = Q'*Y = b, a mx1 vector.
where the superscript ' means matrix transposition.
Note that Q orthogonal means that Q*Q' = I, a mxm identity matrix.
Let b1 = the first n components of b; b2 = the last m-n components of b.
Taking advantage of the structure of U, write:
V*ß = b1 (*)
The linear problem (*) is equivalent to the original problem. Moreover:
- I can solve for ß by back substitution.
- The regression problem's residual = ||b2||, the norm of b2.
So I have the main part of the problem. The statistical terms can be calculated as:
- Residual sum of squares: RSS = ||b2||^2
- Residual mean square(MSRSS): s2 = RSS/(m-n)
- Estimated variance of ß: var(ß) = s2 * (V'*V)^(-1) where ^(-1) denotes the inverse of the matrix. Note: var(ß) = VC, a nxn symmetric matrix - the variance-covariance matrix of ß
- Standard error of parameter ßi; stderror(i) = sqrt(VC(i,i))
- The t-value of parameter ßi: t-val = ßi / stderror(i)
- Sum of squares of Y: YY = Y'*Y
- Sum of Y: sumY = sum(Y) = sum(Y1,Y2,...,Ym)
- Total corrected sum of squares: SYY = YY - sumY*sumY/m
- Regression sum of squares: SSreg = SYY - RSS
- Coefficient of determination: R2 = SSreg/SYY
- Adjusted coefficient of determination: R2b = 1 - (1 -R2)*(m-1)/(m-n)
- Regression mean square: MSSreg = SSreg/(m-1)
- F-test for regression: compare F = MSSreg / MSRSS with F(n,m-n) distribution
Analysis of Variance
|
Source
|
Sum of Squares
|
Degrees of Freedom
|
Mean Square
|
F(n,m-n)
|
Regression
|
SSreg
|
n-1
|
MSSreg
|
F
|
Residual
|
RSS
|
m-n
|
MSRSS
|
|
Total
|
SYY
|
m-1
|
|
|
Notes:
1. Do not form the matrix X'*X and then compute its inverse. It takes too much computer time and could be numerically unstable.
2. Do not compute (V'*V)^(-1). Solve for each column of the matrix iV, the inverse matrix of V, by back substitution:
V * iV = I
where I is the nxn identity matrix. Compute iV*iV' if necessary (it is symmetric), or only the diagonal if you just need the standard errors of the parameters.
Restricted least squares:
Having obtained the unrestricted estimate of ß,
with linear restrictions given by:
r = R * ß,
compute the restricted estimate rß as:
Let S = (V'*V)^(-1) * R' * [R * (V'*V)^(-1) * R']^(-1)
then rß = ß + S * [r - R*ß]
- The residual = ||Y - X * rß||
- Residual sum of squares: RSS = ||Y - X * rß||^2
- Residual mean square(MSRSS): s2 = RSS/(m-n)
- Estimated variance of rß: var(rß)= s2 * (I - S * R) * (V'*V)^(-1)
Return to the statistics page
|