TA的每日心情 | 衰 2019-11-19 15:32 |
|---|
签到天数: 1 天 [LV.1]初来乍到
|
EDA365欢迎您登录!
您需要 登录 才可以下载或查看,没有帐号?注册
x
留一法交叉验证(LOOCV)% h |' g# D. c5 W0 j/ X3 k
留一法即Leave-One-Out Cross Validation。这种方法比较简单易懂,就是把一个大的数据集分为k个小数据集,其中k-1个作为训练集,剩下的一个作为测试集,然后选择下一个作为测试集,剩下的k-1个作为训练集,以此类推。其主要目的是为了防止过拟合,评估模型的泛化能力。计算时间较长。
' B9 ~2 K8 L# d. Q6 g
9 O$ R3 K0 H5 U2 s$ T适用场景:
5 i% Q: w/ h4 L- C7 ^. ^
! g8 L& A/ R7 K1 O- N5 ?0 Z数据集少,如果像正常一样划分训练集和验证集进行训练,那么可以用于训练的数据本来就少,还被划分出去一部分,这样可以用来训练的数据就更少了。loocv可以充分的利用数据。
( [ P6 P+ ?8 h! |3 I# ?
- U) g1 V% f; t- W: L. |- |
/ M% S% s$ F* v9 R快速留一法KNN; R9 r4 e3 v! X8 x8 {
1 K( d- ?+ T0 w! U x/ R: L5 o
因为LOOCV需要划分N次,产生N批数据,所以在一轮训练中,要训练出N个模型,这样训练时间就大大增加。为了解决这样的问题,根据留一法的特性,我们可以提前计算出不同样本之间的距离(或者距离的中间值),存储起来。使用LOOCV时直接从索引中取出即可。下面的代码以特征选择为Demo,验证快速KNN留一法。4 p5 d) P& e8 X0 p7 }
: L9 ^4 N, @' A5 e其中FSKNN1是普通KNN,FSKNN2是快速KNN
# }1 k2 X9 m" i/ Z$ [, ~5 n. s* a9 } a3 i! Q' v
主函数main.m
& n# W6 [# j. Z8 f5 o& O, C; V, B' w5 G7 s" ]# {( `/ z
- clc
- [train_F,train_L,test_F,test_L] = divide_dlbcl();
- dim = size(train_F,2);
- individual = rand(1,dim);
- global choice
- choice = 0.5;
- global knnIndex
- [knnIndex] = preKNN(individual,train_F);
- for i = 1:100
- [error,fs] = FSKNN1(individual,train_F,train_L);
- [error2,fs2] = FSKNN2(individual,train_F,train_L);
- end
+ D! k+ i# P: I: F! K7 _ ; ?& k, v9 r7 t! O* l4 w7 m
9 g0 k# ? u, {$ J- N; H2 e% Y
数据集划分divide_dlbcl.m
m6 t \2 D9 @# w5 ^3 Y. c9 D- \8 d
- function [train_F,train_L,test_F,test_L] = divide_dlbcl()
- load DLBCL.mat;
- dataMat=ins;
- len=size(dataMat,1);
- %归一化
- maxV = max(dataMat);
- minV = min(dataMat);
- range = maxV-minV;
- newdataMat = (dataMat-repmat(minV,[len,1]))./(repmat(range,[len,1]));
- Indices = crossvalind('Kfold', length(lab), 10);
- site = find(Indices==1|Indices==2|Indices==3);
- test_F = newdataMat(site,:);
- test_L = lab(site);
- site2 = find(Indices~=1&Indices~=2&Indices~=3);
- train_F = newdataMat(site2,:);
- train_L =lab(site2);
- end
. {+ O& g5 p, w* r ; y5 D9 H- ]( I% T# O5 H4 C
2 S1 I' ^5 P( Q2 ^2 {& ]简单KNN
- \1 d) \, r, _' j _9 U- [
; n6 _: V4 v, O" z) `" f6 t) WFSKNN1.m
: k9 P' g! V4 q( F p" K: d3 C$ F' k$ o ?9 r0 p; F
- function [error,fs] = FSKNN1(x,train_F,train_L)
- global choice
- inmodel = x>choice;%%%%%设定恰当的阈值选择特征
- k=1;
- train_f=train_F(:,inmodel);
- train_length = size(train_F,1);
- flag = logical(ones(train_length,1));
- error=0;
- for j=1:train_length
- flag(j) = 0;
- CtrainF = train_f(flag,:);
- CtrainL = train_L(flag);
- CtestF = train_f(~flag,:);
- CtestL = train_L(~flag);
- classifyresult= KNN1(CtestF,CtrainF,CtrainL,k);
- if (CtestL~=classifyresult)
- error=error+1;
- end
- flag(j) = 1;
- end
- error=error/train_length;
- fs = sum(inmodel);
- end
# f- F2 o h! t1 \' g! Y 0 _2 }% y( H9 a0 V; g9 S- Z
* j% }5 P1 ~0 a4 b
KNN1.m
2 b& h* d/ Z2 p8 w, ~7 @0 B
# J' o$ L8 H5 T: x1 {- function relustLabel = KNN1(inx,data,labels,k)
- %%
- % inx 为 输入测试数据,data为样本数据,labels为样本标签 k值自定1~3
- %%
- [datarow , datacol] = size(data);
- diffMat = repmat(inx,[datarow,1]) - data ;
- distanceMat = sqrt(sum(diffMat.^2,2));
- [B , IX] = sort(distanceMat,'ascend');
- len = min(k,length(B));
- relustLabel = mode(labels(IX(1:len)));
- end
6 F* z7 i! e) U: @, m6 n9 D8 Y
6 v7 V' T8 r; J+ G. T' W5 `8 F" \5 d- B
快速KNN
% w( l P/ a U/ i6 Z: A3 w+ K- y
preKNN.m
2 V2 n x' s, L' d0 S' }
' w8 D; G4 p3 g s" P* Y( \, `- u- function [knnIndex] = preKNN(x,train_F)
- inmodel = x > 0;
- train_f=train_F(:,inmodel);
- train_length = size(train_F,1);
- flag = logical(ones(train_length,1));
- knnIndex = cell(train_length,1);
- for j=1:train_length
- flag(j) = 0;
- CtrainF = train_f(flag,:);
- CtestF = train_f(~flag,:);
- [datarow , ~] = size(CtrainF);
- diffMat = repmat(CtestF,[datarow,1]) - CtrainF ;
- diffMat = diffMat.^2;
- knnIndex{j,1} = diffMat;
- flag(j) = 1;
- end
- end
~: @" c$ z6 ? @; d 0 ^/ ^+ ~& @+ b1 ?
$ a: j6 F5 w# B* _8 `
FSKNN2.m9 H. U2 C2 g* `0 W% V
# d. R- c8 O; J5 P) p+ z
- function [error,fs] = FSKNN2(x,train_F,train_L)
- global choice
- inmodel = x>choice;%%%%%设定恰当的阈值选择特征
- global knnIndex
- k=1;
- train_length = size(train_F,1);
- flag = logical(ones(train_length,1));
- error=0;
- for j=1:train_length
- flag(j) = 0;
- CtrainL = train_L(flag);
- CtestL = train_L(~flag);
- classifyresult= KNN2(CtrainL,k,knnIndex{j}(:,inmodel));
- if(CtestL~=classifyresult)
- error=error+1;
- end
- flag(j) = 1;
- end
- error=error/train_length;
- fs = sum(inmodel);
- end p* L9 K0 @! z
" B, r. _4 s' ~* R6 q
' }5 i8 I5 X6 m1 E
KNN2.m
9 n" V+ D9 F. j" O7 V7 l9 U
* u3 Z" }' q) l% ]0 b- function relustLabel = KNN2(labels,k,diffMat)
- distanceMat = sqrt(sum(diffMat,2));
- [B , IX] = sort(distanceMat,'ascend');
- len = min(k,length(B));
- relustLabel = mode(labels(IX(1:len)));
- end
* {: i0 t, n5 W
5 f% r0 L6 [3 ?0 H+ r, e* X# d
3 o. s8 Q2 L8 J* @# g+ v) C c结果" I4 `) L: \/ h. c
1 h0 s( Y2 { x0 C
3 V" L. ^2 ?2 N, J% T; c9 w. Y/ H4 ?: ]; z
可以看到FSKNN2+preKNN的时间比FSKNN1要少很多。5 E/ V3 t J* c' x3 d) i
|
|