TA的每日心情 | 衰 2019-11-19 15:32 |
|---|
签到天数: 1 天 [LV.1]初来乍到
|
EDA365欢迎您登录!
您需要 登录 才可以下载或查看,没有帐号?注册
x
留一法交叉验证(LOOCV)
- J( V3 x0 {# n) t9 X% r, J留一法即Leave-One-Out Cross Validation。这种方法比较简单易懂,就是把一个大的数据集分为k个小数据集,其中k-1个作为训练集,剩下的一个作为测试集,然后选择下一个作为测试集,剩下的k-1个作为训练集,以此类推。其主要目的是为了防止过拟合,评估模型的泛化能力。计算时间较长。
: a' {6 o( B$ A& C+ r) C# Y& E8 M. G5 w9 h
适用场景:
; F, }$ }8 c) j- r
9 B' F, ~+ i. p3 y* ]8 V J, Y2 p数据集少,如果像正常一样划分训练集和验证集进行训练,那么可以用于训练的数据本来就少,还被划分出去一部分,这样可以用来训练的数据就更少了。loocv可以充分的利用数据。
4 y R& Q! T" o5 R- M; h: |- u S) ]
- `6 H! B+ t( [! w( }1 U q( g快速留一法KNN( ~" U. F! k; G8 L; R
) b* b( B* _5 P8 {因为LOOCV需要划分N次,产生N批数据,所以在一轮训练中,要训练出N个模型,这样训练时间就大大增加。为了解决这样的问题,根据留一法的特性,我们可以提前计算出不同样本之间的距离(或者距离的中间值),存储起来。使用LOOCV时直接从索引中取出即可。下面的代码以特征选择为Demo,验证快速KNN留一法。
6 C2 r5 o; ` \$ g+ A: M' M, V
. t; H9 ~. y( N' y其中FSKNN1是普通KNN,FSKNN2是快速KNN
7 f. t' B% D0 z |4 c+ I; E) ?* C
主函数main.m, e3 J2 q T' M
# h+ o4 g2 P$ b' X
- clc
- [train_F,train_L,test_F,test_L] = divide_dlbcl();
- dim = size(train_F,2);
- individual = rand(1,dim);
- global choice
- choice = 0.5;
- global knnIndex
- [knnIndex] = preKNN(individual,train_F);
- for i = 1:100
- [error,fs] = FSKNN1(individual,train_F,train_L);
- [error2,fs2] = FSKNN2(individual,train_F,train_L);
- end
& V( [2 a9 [* r* Y+ H
# S1 V/ c0 f# e' C7 _) D2 B+ k' ~' d* x0 t3 }1 X
数据集划分divide_dlbcl.m0 M V" ^7 Z2 M. F( x" t; ~( o
" c: R* {( S5 Z& o, t- function [train_F,train_L,test_F,test_L] = divide_dlbcl()
- load DLBCL.mat;
- dataMat=ins;
- len=size(dataMat,1);
- %归一化
- maxV = max(dataMat);
- minV = min(dataMat);
- range = maxV-minV;
- newdataMat = (dataMat-repmat(minV,[len,1]))./(repmat(range,[len,1]));
- Indices = crossvalind('Kfold', length(lab), 10);
- site = find(Indices==1|Indices==2|Indices==3);
- test_F = newdataMat(site,:);
- test_L = lab(site);
- site2 = find(Indices~=1&Indices~=2&Indices~=3);
- train_F = newdataMat(site2,:);
- train_L =lab(site2);
- end7 t# C1 v1 i3 |, P. R) @
" a5 x/ V H7 O, s; D& ^! T1 t, E
3 V4 @; v) w0 h$ \简单KNN# E9 b0 z1 n9 c
, F* D; e1 y" u. `2 qFSKNN1.m% a/ o! H# L( H$ b& w' I8 r5 M
5 \, s9 G$ V3 E+ A! \. q1 i1 Q: w- function [error,fs] = FSKNN1(x,train_F,train_L)
- global choice
- inmodel = x>choice;%%%%%设定恰当的阈值选择特征
- k=1;
- train_f=train_F(:,inmodel);
- train_length = size(train_F,1);
- flag = logical(ones(train_length,1));
- error=0;
- for j=1:train_length
- flag(j) = 0;
- CtrainF = train_f(flag,:);
- CtrainL = train_L(flag);
- CtestF = train_f(~flag,:);
- CtestL = train_L(~flag);
- classifyresult= KNN1(CtestF,CtrainF,CtrainL,k);
- if (CtestL~=classifyresult)
- error=error+1;
- end
- flag(j) = 1;
- end
- error=error/train_length;
- fs = sum(inmodel);
- end( |$ j3 U, _' ^4 W3 S
2 H# N3 A4 H/ {/ t
; ~6 o$ F3 {) e8 d3 z5 u# CKNN1.m# K" @- l$ `: ^: N
% `$ N+ Y- ~+ O- function relustLabel = KNN1(inx,data,labels,k)
- %%
- % inx 为 输入测试数据,data为样本数据,labels为样本标签 k值自定1~3
- %%
- [datarow , datacol] = size(data);
- diffMat = repmat(inx,[datarow,1]) - data ;
- distanceMat = sqrt(sum(diffMat.^2,2));
- [B , IX] = sort(distanceMat,'ascend');
- len = min(k,length(B));
- relustLabel = mode(labels(IX(1:len)));
- end
; x: t' ?* h8 P0 b1 R' H3 a
1 u; y. o! h7 d) f% ]& ^/ N, `* j$ a! y! e! _( t8 S) r
快速KNN3 E5 k4 |! J* A7 W( }( Z
% y0 X- M7 ?2 f" G& M( ~
preKNN.m
: T4 y+ M$ E2 w& }/ j( B
5 d/ ~ K# @9 N A% z/ A- function [knnIndex] = preKNN(x,train_F)
- inmodel = x > 0;
- train_f=train_F(:,inmodel);
- train_length = size(train_F,1);
- flag = logical(ones(train_length,1));
- knnIndex = cell(train_length,1);
- for j=1:train_length
- flag(j) = 0;
- CtrainF = train_f(flag,:);
- CtestF = train_f(~flag,:);
- [datarow , ~] = size(CtrainF);
- diffMat = repmat(CtestF,[datarow,1]) - CtrainF ;
- diffMat = diffMat.^2;
- knnIndex{j,1} = diffMat;
- flag(j) = 1;
- end
- end+ O5 N9 l, o" d* s
5 _5 ?% v4 z- w5 u3 ^- x/ O( m/ L- L$ W5 L$ L: r5 D
FSKNN2.m6 Y$ N# T' J( [. W' _
$ Q( q2 ]( a: f/ J+ o% k3 M) ^, f- function [error,fs] = FSKNN2(x,train_F,train_L)
- global choice
- inmodel = x>choice;%%%%%设定恰当的阈值选择特征
- global knnIndex
- k=1;
- train_length = size(train_F,1);
- flag = logical(ones(train_length,1));
- error=0;
- for j=1:train_length
- flag(j) = 0;
- CtrainL = train_L(flag);
- CtestL = train_L(~flag);
- classifyresult= KNN2(CtrainL,k,knnIndex{j}(:,inmodel));
- if(CtestL~=classifyresult)
- error=error+1;
- end
- flag(j) = 1;
- end
- error=error/train_length;
- fs = sum(inmodel);
- end
. W1 J3 @2 m2 y% W" p; ?9 U , ^+ _( m, s/ r& f+ s2 `6 y
S0 G. p; W9 D1 Y0 k1 OKNN2.m
& t [2 }/ E& a" j: F: s5 R! B
3 l! }& N3 @4 n+ j- function relustLabel = KNN2(labels,k,diffMat)
- distanceMat = sqrt(sum(diffMat,2));
- [B , IX] = sort(distanceMat,'ascend');
- len = min(k,length(B));
- relustLabel = mode(labels(IX(1:len)));
- end
& ?7 D& d) w d0 Q7 i
+ T) t$ g) {" W1 |+ h8 O7 [9 j( a' L9 o$ W7 J0 @" E# J
结果
& ?4 ~* |; u( ?
8 G& O x* l K J C4 K
) z- e9 I% r3 G# ~4 W( D* Q5 l
2 Q8 m! ?5 v1 M9 d6 A x# |1 k可以看到FSKNN2+preKNN的时间比FSKNN1要少很多。+ R; }( I- ?/ q' V# v
|
|