EDA365电子论坛网

标题: 关于regress函数的使用 [打印本页]

作者: smileqq 时间: 2020-4-14 10:34
标题: 关于regress函数的使用

[b,bint,r,rint,stats] = regress(y,X)' m# m" _7 R$ L, Q7 e# C+ t8 ]

这个是regress的使用说明，用来进行多元线性回归。
第一个问题：regress的第三个参数为置信水平，可填可不填，但是不管我填写与否，都会有一个warning：R-square and the F statistic are not well-defined unless X has a column of ones.
Type "help regress" for more information.

第二个问题：r是预测值和真实值的差，r'*r应该是残差平方和吗？它能够用来评价回归模型的好坏吗？

第三个问题：stats是一个数组，The vector stats contains the R2 statistic along with the F and p values for the regression
很多网上的使用说明，包括matlab的help都只提到了stats数组的3个成员，但是我使用regress函数后stats有4个成员，请问另外一个是代表什么问题

作者: ExxNEN 时间: 2020-4-14 18:28
REGRESS Multiple linear regression using least squares.
B = REGRESS (Y,X)
returns the vector B of regression coefficients in the
linear model Y = X*B.

X is an n-by-p design matrix, with rows
corresponding to observations and columns to predictor variables.

Y is an n-by-1 vector of response observations.
REGRESS
多元线性回归——用最小二乘估计法
B = REGRESS (Y,X) ，

返回值为线性模型Y = X*B的回归系数向量
X ，n-by-p 矩阵，行对应于观测值，列对应于预测变量
Y ，n-by-1 向量，观测值的响应（即因变量）

[B,BINT] = REGRESS (Y,X)
returns a matrix BINT of 95% confidence intervals for B.
BINT，B的95%的置信区间矩阵

[B,BINT,R] = REGRESS (Y,X)
returns a vector R of residuals.
R，残差向量

[B,BINT,R,RINT] = REGRESS (Y,X)
returns a matrix RINT of intervals that
can be used to diagnose outliers.

If RINT(i,: ) does not contain zero,

then the i-th residual is larger than would be expected, at the 5%
significance level.

This is evidence that the I-th observation is an outlier.

RINT，区间矩阵，该矩阵可以用来诊断异常（即发现奇异观测值，译者注）。
如果RINT(i，：)所定区间没有包含0，则第i个残差在默认的5%的显著性水平比我们所预期的要大，这可说明第i个观测值是个奇异点（即说明该点可能是错误而无意义的，如记录错误等，译者注）

[B,BINT,R,RINT,STATS] = REGRESS (Y,X)
returns a vector STATS containing
the R-square statistic, the F statistic and p value for the full model,and an estimate of the error variance.

STATS，向量，包括R方统计量，F统计量，总模型的p值（还不清楚）和方差的一个估计（还不清楚）

[...] = REGRESS (Y,X,ALPHA)
uses a 100*(1-ALPHA)% confidence level to compute BINT, and a (100*ALPHA)% significance level to compute RINT.
用100*(1-ALPHA)%的置信水平来计算BINT，
用(100*ALPHA)%的显著性水平来计算RINT

X should include a column of ones so that the model contains a constant
term.
The F statistic and p value are computed under the assumption
that the model contains a constant term, and they are not correct for
models without a constant.
The R-square value is one minus the ratio of
the error sum of squares to the total sum of squares.
This value can
be negative for models without a constant, which indicates that the model is not appropriate for the data.
X应该包含一个全“1”的列，这样则该模型包含常数项。F统计量和p值是在模型有常数项的假设下计算的，如果模型没有常数项，则计算得的F统计量和p值是不正确的。The R-square value is one minus the ratio of the error sum of squares to the total sum of squares.（此句无法把握，请高手帮忙~~!）若模型没有常数项，则这个值可以为负值，这也表明这个模型对数据是不合适的。（即数据不适合用多元线性模型，译者注）

If columns of X are linearly dependent, REGRESS sets the maximum
possible number of elements of B to zero to obtain a "basic solution",
and returns zeros in elements of BINT corresponding to the zero elements of B.
如果X的列是线性相关的，则REGRESS将使B的元素中“0”的数量尽量多，以此获得一个“基本解”，并且使B中元素“0”所对应的BINT元素为“0”。

REGRESS treats NaNs in X or Y as missing values, and removes them. REGRESS
将X或者Y中的NaNs当作缺失值处理，并且移除它们。

欢迎光临 EDA365电子论坛网 (https://bbs.eda365.com/)