TA的每日心情 | 衰 2019-11-19 15:32 |
---|
签到天数: 1 天 [LV.1]初来乍到
|
EDA365欢迎您登录!
您需要 登录 才可以下载或查看,没有帐号?注册
x
留一法交叉验证(LOOCV)
, L6 _% }8 R: k( ]% U+ o3 K留一法即Leave-One-Out Cross Validation。这种方法比较简单易懂,就是把一个大的数据集分为k个小数据集,其中k-1个作为训练集,剩下的一个作为测试集,然后选择下一个作为测试集,剩下的k-1个作为训练集,以此类推。其主要目的是为了防止过拟合,评估模型的泛化能力。计算时间较长。
! v' n6 ^% ^0 u9 e. ]6 O6 k* k. N+ ?# O, L4 ^7 w
适用场景:. b ^* e& K a, c' W; {
/ T$ O9 ^: Q- G数据集少,如果像正常一样划分训练集和验证集进行训练,那么可以用于训练的数据本来就少,还被划分出去一部分,这样可以用来训练的数据就更少了。loocv可以充分的利用数据。
) Y: y' @2 {, H+ C1 U& H* s8 I5 Z; D( t: y9 b3 q
5 B6 n" V" }1 h$ d
快速留一法KNN3 f: M* D h# K. o. k" b$ N
' x& }1 Y( |! L. g. L, Q
因为LOOCV需要划分N次,产生N批数据,所以在一轮训练中,要训练出N个模型,这样训练时间就大大增加。为了解决这样的问题,根据留一法的特性,我们可以提前计算出不同样本之间的距离(或者距离的中间值),存储起来。使用LOOCV时直接从索引中取出即可。下面的代码以特征选择为Demo,验证快速KNN留一法。, x- x+ L9 X7 |0 V/ |
9 v$ H/ W$ @ ?0 M8 ]) U- i2 @
其中FSKNN1是普通KNN,FSKNN2是快速KNN. E3 u* ^; \( ^
0 Z: j% M" L* t' q0 h* a主函数main.m
8 b9 V1 o+ x/ p. ]3 `/ j: W! _) o F8 Y+ v( u7 n( {. t% [
- clc
- [train_F,train_L,test_F,test_L] = divide_dlbcl();
- dim = size(train_F,2);
- individual = rand(1,dim);
- global choice
- choice = 0.5;
- global knnIndex
- [knnIndex] = preKNN(individual,train_F);
- for i = 1:100
- [error,fs] = FSKNN1(individual,train_F,train_L);
- [error2,fs2] = FSKNN2(individual,train_F,train_L);
- end% n8 P/ v0 Q) }, d
7 v" X: T1 X# l1 K' D$ N0 j. a O( [. R- |8 q: }' ^
数据集划分divide_dlbcl.m
# y* w- @; q5 ^+ `0 _, b2 j, r/ t
- function [train_F,train_L,test_F,test_L] = divide_dlbcl()
- load DLBCL.mat;
- dataMat=ins;
- len=size(dataMat,1);
- %归一化
- maxV = max(dataMat);
- minV = min(dataMat);
- range = maxV-minV;
- newdataMat = (dataMat-repmat(minV,[len,1]))./(repmat(range,[len,1]));
- Indices = crossvalind('Kfold', length(lab), 10);
- site = find(Indices==1|Indices==2|Indices==3);
- test_F = newdataMat(site,:);
- test_L = lab(site);
- site2 = find(Indices~=1&Indices~=2&Indices~=3);
- train_F = newdataMat(site2,:);
- train_L =lab(site2);
- end& M, U2 _" e5 }6 A
t5 x) [) t; \+ x5 G, j w* l, ~2 s) ?2 `9 a u) M
简单KNN
8 ?) K1 u5 k& I* [- i2 R1 p4 y+ ]: U. M; A
FSKNN1.m. R! D! M- g; z, v0 v
' }/ w! p: v4 A# w
- function [error,fs] = FSKNN1(x,train_F,train_L)
- global choice
- inmodel = x>choice;%%%%%设定恰当的阈值选择特征
- k=1;
- train_f=train_F(:,inmodel);
- train_length = size(train_F,1);
- flag = logical(ones(train_length,1));
- error=0;
- for j=1:train_length
- flag(j) = 0;
- CtrainF = train_f(flag,:);
- CtrainL = train_L(flag);
- CtestF = train_f(~flag,:);
- CtestL = train_L(~flag);
- classifyresult= KNN1(CtestF,CtrainF,CtrainL,k);
- if (CtestL~=classifyresult)
- error=error+1;
- end
- flag(j) = 1;
- end
- error=error/train_length;
- fs = sum(inmodel);
- end5 w I8 v& z+ ?, N
( l g+ I5 G* D7 d( [' M. P+ _$ c3 `
KNN1.m4 E B5 P& U2 \& V4 T2 k" o
+ _' U0 a# l" N3 P- function relustLabel = KNN1(inx,data,labels,k)
- %%
- % inx 为 输入测试数据,data为样本数据,labels为样本标签 k值自定1~3
- %%
- [datarow , datacol] = size(data);
- diffMat = repmat(inx,[datarow,1]) - data ;
- distanceMat = sqrt(sum(diffMat.^2,2));
- [B , IX] = sort(distanceMat,'ascend');
- len = min(k,length(B));
- relustLabel = mode(labels(IX(1:len)));
- end& q9 F/ E) q9 r* A
; k4 s; ~, D w y
) f; [( u" s+ }5 _ E7 V' r; v5 I快速KNN4 m5 k' Q% p @6 w+ P/ G" X
7 L. v5 O' {- r/ }% K9 gpreKNN.m$ s. K! m/ d$ S" g8 d
( [$ t2 F+ Z% m/ ~7 [* M5 r- function [knnIndex] = preKNN(x,train_F)
- inmodel = x > 0;
- train_f=train_F(:,inmodel);
- train_length = size(train_F,1);
- flag = logical(ones(train_length,1));
- knnIndex = cell(train_length,1);
- for j=1:train_length
- flag(j) = 0;
- CtrainF = train_f(flag,:);
- CtestF = train_f(~flag,:);
- [datarow , ~] = size(CtrainF);
- diffMat = repmat(CtestF,[datarow,1]) - CtrainF ;
- diffMat = diffMat.^2;
- knnIndex{j,1} = diffMat;
- flag(j) = 1;
- end
- end
1 `: p d Y2 ~; v) K- [7 Q4 ? 9 }) B; }7 R2 n0 J. v
2 Q+ T) K% x8 S3 c/ dFSKNN2.m
! g5 w; [- _- D- m$ D) ~, \
H1 w& A5 z* U- u) a5 [5 Q- [- function [error,fs] = FSKNN2(x,train_F,train_L)
- global choice
- inmodel = x>choice;%%%%%设定恰当的阈值选择特征
- global knnIndex
- k=1;
- train_length = size(train_F,1);
- flag = logical(ones(train_length,1));
- error=0;
- for j=1:train_length
- flag(j) = 0;
- CtrainL = train_L(flag);
- CtestL = train_L(~flag);
- classifyresult= KNN2(CtrainL,k,knnIndex{j}(:,inmodel));
- if(CtestL~=classifyresult)
- error=error+1;
- end
- flag(j) = 1;
- end
- error=error/train_length;
- fs = sum(inmodel);
- end
9 @1 I9 m. ]% C( t3 x% B 2 m. ?* J5 v! b% i$ d2 o) d
, @* R' U$ l) G: T
KNN2.m
/ {, @# @' Q3 Y# T# D, \% a: h2 s7 B% ^6 ]/ H& `# h) B- `- @& e
- function relustLabel = KNN2(labels,k,diffMat)
- distanceMat = sqrt(sum(diffMat,2));
- [B , IX] = sort(distanceMat,'ascend');
- len = min(k,length(B));
- relustLabel = mode(labels(IX(1:len)));
- end
/ n4 a: [, Q* z& B3 }7 ` O
' M, W3 w- D& B3 H
7 z8 l& u. f) w ]8 f结果. i0 j) f8 U: E. T9 e" ~7 p
% ^( _0 t& b! l- Q
9 R0 _+ R% s" y7 |5 t
6 G/ O5 v. j: ~. e
可以看到FSKNN2+preKNN的时间比FSKNN1要少很多。# ?- |, \. R: K( ]( r. L: |1 K
|
|