|
|
REGRESS Multiple linear regression using least squares.
: ?- Q) c) o) p! i$ C: XB = REGRESS (Y,X) ( V( N( J6 o: {- @: f
returns the vector B of regression coefficients in the
& G; J3 S4 P, Blinear model Y = X*B.9 }6 }3 Z& B# W3 q. w. X, B
1 G' ^& Q5 p: _( bX is an n-by-p design matrix, with rows/ h9 c5 A) S* h4 N G
corresponding to observations and columns to predictor variables.
, G) H+ u+ p7 W; @1 T
+ C8 d1 w$ f. T$ E+ ^2 {Y is an n-by-1 vector of response observations./ r {* j% }; u" q" F% p
REGRESS
W4 Z7 a y, k: k6 Z3 q多元线性回归——用最小二乘估计法
% _6 O3 Q4 g A! t9 b& P' K- TB = REGRESS (Y,X) , k) c7 k _2 K6 @
$ R( F3 e5 i& O7 T) @
返回值为线性模型Y = X*B的回归系数向量
; L% `+ X) @6 ~4 l* B0 y X ,n-by-p 矩阵,行对应于观测值,列对应于预测变量
- Q7 c- P8 M% [- b& V- K: K' n Y ,n-by-1 向量,观测值的响应(即因变量)
4 t8 j, w/ R) K" }
3 |: m" w! Y3 a[B,BINT] = REGRESS (Y,X) 2 B6 v+ c6 C% b6 b
returns a matrix BINT of 95% confidence intervals for B.
6 ^& ]& y# W I1 X& |6 xBINT,B的95%的置信区间矩阵
N: e( Z2 O i! `7 l# o+ G) r' y
/ b {) {- Z! L2 U# M, {$ I0 Q[B,BINT,R] = REGRESS (Y,X)% A+ Q5 m k% n! P
returns a vector R of residuals./ _. V5 ^ _: `) ~* ^( |4 z* x
R,残差向量
/ R6 b8 D# W6 {4 f4 k. Y6 _+ l$ h/ \5 P& R, k4 @9 T8 l3 B( G
[B,BINT,R,RINT] = REGRESS (Y,X) $ {3 M" C- [4 U
returns a matrix RINT of intervals that
$ Q% t/ R1 w, E% N% U4 [can be used to diagnose outliers.1 z* c7 p4 b# U5 W
0 Y1 [1 v# [ \
If RINT(i,: ) does not contain zero,
a* S! D$ q+ j0 w/ |0 L9 M2 A
7 }. `, U' R# othen the i-th residual is larger than would be expected, at the 5%" h* G1 v5 f" M* C/ l; \
significance level.1 k! E0 J3 A$ p
2 a* a" T, b! v1 H6 K
This is evidence that the I-th observation is an outlier.
$ q3 ]6 |7 S! W' w) T
, z4 s D+ O& c. ^2 CRINT,区间矩阵,该矩阵可以用来诊断异常(即发现奇异观测值,译者注)。/ r$ c! B% S$ A
如果RINT(i,:)所定区间没有包含0,则第i个残差在默认的5%的显著性水平比我们所预期的要大,这可说明第i个观测值是个奇异点(即说明该点可能是错误而无意义的,如记录错误等,译者注) P- ^. U2 F( x- z; q" ~
8 U4 G" p# a7 w. S$ i2 w' J[B,BINT,R,RINT,STATS] = REGRESS (Y,X)
D8 p% `( s/ Z, H- P; }returns a vector STATS containing$ a( a3 U: f, s# V
the R-square statistic, the F statistic and p value for the full model,and an estimate of the error variance.' v; M" O: ]$ a6 W& a3 W
; D1 ?# v; N( p
STATS,向量,包括R方统计量,F统计量,总模型的p值(还不清楚)和方差的一个估计(还不清楚)
" B9 G3 u- D' j4 M0 g0 L# L4 ?& G- J/ @; r
[...] = REGRESS (Y,X,ALPHA) 0 ?* i& u# A# u3 z) K7 }1 u
uses a 100*(1-ALPHA)% confidence level to compute BINT, and a (100*ALPHA)% significance level to compute RINT.
6 e3 n( ^( C' I( }0 `2 D用100*(1-ALPHA)%的置信水平来计算BINT,# Z' g, P+ Z1 A! M6 d
用(100*ALPHA)%的显著性水平来计算RINT4 _1 W: U2 _$ r& f% J5 q1 J: ^
# q: G% ^% v' e g% s
X should include a column of ones so that the model contains a constant3 B: t! a, ], t6 u8 P) b
term.
2 ~7 n, y+ {& ^* K, UThe F statistic and p value are computed under the assumption# ?+ E/ P- Q7 v+ Y0 p* Q) y8 {; C( p
that the model contains a constant term, and they are not correct for
: R/ h( D% E& Z6 x( h" A% \models without a constant.
* D1 |* A r# b* v6 K3 PThe R-square value is one minus the ratio of8 V v# E% A' M ]; s' F# v3 w
the error sum of squares to the total sum of squares.- U! `( c, k G. [
This value can
* G! Z6 G( Z( o+ p } ~/ |be negative for models without a constant, which indicates that the model is not appropriate for the data.
. j/ C5 J0 [( @0 RX应该包含一个全“1”的列,这样则该模型包含常数项。F统计量和p值是在模型有常数项的假设下计算的,如果模型没有常数项,则计算得的F统计量和p值是不正确的。The R-square value is one minus the ratio of the error sum of squares to the total sum of squares.(此句无法把握,请高手帮忙~~!)若模型没有常数项,则这个值可以为负值,这也表明这个模型对数据是不合适的。(即数据不适合用多元线性模型,译者注)
6 z8 y7 E1 V7 e4 m3 u2 ?# T) d) J# V/ m! C+ Q0 X G$ J
If columns of X are linearly dependent, REGRESS sets the maximum0 |% }- a' P' b$ |8 l
possible number of elements of B to zero to obtain a "basic solution",* G9 Y( G0 f6 S: t1 W6 ~6 p1 J7 q
and returns zeros in elements of BINT corresponding to the zero elements of B.
! z8 I* S2 H7 O; F% z" C5 [如果X的列是线性相关的,则REGRESS将使B的元素中“0”的数量尽量多,以此获得一个“基本解”,并且使B中元素“0”所对应的BINT元素为“0”。* [4 i, D2 t3 e3 @. Y7 Q
2 [7 E( d; [$ N. h/ Q
REGRESS treats NaNs in X or Y as missing values, and removes them. REGRESS
V$ Z; e) g$ v5 G7 x- G将X或者Y中的NaNs当作缺失值处理,并且移除它们。 |
|