TA的每日心情 | 衰 2019-11-19 15:32 |
---|
签到天数: 1 天 [LV.1]初来乍到
|
EDA365欢迎您登录!
您需要 登录 才可以下载或查看,没有帐号?注册
x
留一法交叉验证(LOOCV)
9 I& j5 p! {" J6 |: }- i# I留一法即Leave-One-Out Cross Validation。这种方法比较简单易懂,就是把一个大的数据集分为k个小数据集,其中k-1个作为训练集,剩下的一个作为测试集,然后选择下一个作为测试集,剩下的k-1个作为训练集,以此类推。其主要目的是为了防止过拟合,评估模型的泛化能力。计算时间较长。" N" i) P" W* E; o( E" @5 F7 ?, l- e
& ]" w9 V# p, S6 x适用场景:+ b5 U+ T, E+ A$ _
5 T5 _4 |* @9 j3 ~; U: @) l7 n4 L数据集少,如果像正常一样划分训练集和验证集进行训练,那么可以用于训练的数据本来就少,还被划分出去一部分,这样可以用来训练的数据就更少了。loocv可以充分的利用数据。6 L) I" `! l2 h0 {
( r; O7 [" r! }% l" y, ?' ^5 r) Q
- _4 F+ s. b; g$ {' F0 {
快速留一法KNN
, s! [0 \. R, F
. \3 i, ?& m! Y3 T5 U* }因为LOOCV需要划分N次,产生N批数据,所以在一轮训练中,要训练出N个模型,这样训练时间就大大增加。为了解决这样的问题,根据留一法的特性,我们可以提前计算出不同样本之间的距离(或者距离的中间值),存储起来。使用LOOCV时直接从索引中取出即可。下面的代码以特征选择为Demo,验证快速KNN留一法。
# [" w" s! E2 @+ {- S/ O5 l# x. |+ ]
其中FSKNN1是普通KNN,FSKNN2是快速KNN$ Z: K% B. h; ~& o4 T
( t/ b9 t8 m5 x/ [/ J- |- G, G
主函数main.m
1 {' g! W; }, [6 D; m$ G. Z& x- U2 N& F8 H+ E3 Z" y8 N
- clc
- [train_F,train_L,test_F,test_L] = divide_dlbcl();
- dim = size(train_F,2);
- individual = rand(1,dim);
- global choice
- choice = 0.5;
- global knnIndex
- [knnIndex] = preKNN(individual,train_F);
- for i = 1:100
- [error,fs] = FSKNN1(individual,train_F,train_L);
- [error2,fs2] = FSKNN2(individual,train_F,train_L);
- end0 a6 [9 D; O1 x0 T- I$ O9 C2 N
' B5 Q D# ]9 L0 [, V$ R' y! f! q( Y& H- {* w; n
数据集划分divide_dlbcl.m
4 N5 h* e0 B/ e' M( x7 Q' V
# {# p5 h8 q6 R& A: f' R4 g- function [train_F,train_L,test_F,test_L] = divide_dlbcl()
- load DLBCL.mat;
- dataMat=ins;
- len=size(dataMat,1);
- %归一化
- maxV = max(dataMat);
- minV = min(dataMat);
- range = maxV-minV;
- newdataMat = (dataMat-repmat(minV,[len,1]))./(repmat(range,[len,1]));
- Indices = crossvalind('Kfold', length(lab), 10);
- site = find(Indices==1|Indices==2|Indices==3);
- test_F = newdataMat(site,:);
- test_L = lab(site);
- site2 = find(Indices~=1&Indices~=2&Indices~=3);
- train_F = newdataMat(site2,:);
- train_L =lab(site2);
- end" [- ?2 s6 }- @- P* `" G
2 i+ c) M, ?5 l* `
3 s4 G; O/ g% x! t3 J( l
简单KNN
+ i( h/ t; z" [7 Q5 Z) f
: R+ H# o6 T6 \! P! z [FSKNN1.m
$ j- j% d6 D+ U: z# ]6 H0 G6 h2 d% l1 M7 e, @: T# B& u1 p
- function [error,fs] = FSKNN1(x,train_F,train_L)
- global choice
- inmodel = x>choice;%%%%%设定恰当的阈值选择特征
- k=1;
- train_f=train_F(:,inmodel);
- train_length = size(train_F,1);
- flag = logical(ones(train_length,1));
- error=0;
- for j=1:train_length
- flag(j) = 0;
- CtrainF = train_f(flag,:);
- CtrainL = train_L(flag);
- CtestF = train_f(~flag,:);
- CtestL = train_L(~flag);
- classifyresult= KNN1(CtestF,CtrainF,CtrainL,k);
- if (CtestL~=classifyresult)
- error=error+1;
- end
- flag(j) = 1;
- end
- error=error/train_length;
- fs = sum(inmodel);
- end
0 K2 M4 r+ m. g! u+ I% e
/ y: g: e; v/ `3 ]$ }5 p8 I$ x1 ]
! ?+ g8 j2 g1 p1 KKNN1.m
" B; c9 U' F7 m7 P) q% t w. y3 K! z L
1 W/ s) z. q3 O1 L) X' \- function relustLabel = KNN1(inx,data,labels,k)
- %%
- % inx 为 输入测试数据,data为样本数据,labels为样本标签 k值自定1~3
- %%
- [datarow , datacol] = size(data);
- diffMat = repmat(inx,[datarow,1]) - data ;
- distanceMat = sqrt(sum(diffMat.^2,2));
- [B , IX] = sort(distanceMat,'ascend');
- len = min(k,length(B));
- relustLabel = mode(labels(IX(1:len)));
- end
" j: Q. w) @- |9 g { E* d
5 g; M& n* K1 q0 J. v" x0 n. B& Q1 O4 e) y5 t% g
快速KNN
6 b- Q; z, f2 }1 M& k" L2 o; q
+ D# O9 N+ [ g! M. gpreKNN.m( w# K' T: D1 n5 u- Q
- x0 L& D8 |* F+ V& ~- R
- function [knnIndex] = preKNN(x,train_F)
- inmodel = x > 0;
- train_f=train_F(:,inmodel);
- train_length = size(train_F,1);
- flag = logical(ones(train_length,1));
- knnIndex = cell(train_length,1);
- for j=1:train_length
- flag(j) = 0;
- CtrainF = train_f(flag,:);
- CtestF = train_f(~flag,:);
- [datarow , ~] = size(CtrainF);
- diffMat = repmat(CtestF,[datarow,1]) - CtrainF ;
- diffMat = diffMat.^2;
- knnIndex{j,1} = diffMat;
- flag(j) = 1;
- end
- end- X7 H: v; D! \. N" p. B% [
5 ?) y1 u! n, u% ^- k" ]6 f Y( [) d" J6 C
FSKNN2.m# K6 Z% {: }7 t( \
( \3 V, M6 T# M/ [- function [error,fs] = FSKNN2(x,train_F,train_L)
- global choice
- inmodel = x>choice;%%%%%设定恰当的阈值选择特征
- global knnIndex
- k=1;
- train_length = size(train_F,1);
- flag = logical(ones(train_length,1));
- error=0;
- for j=1:train_length
- flag(j) = 0;
- CtrainL = train_L(flag);
- CtestL = train_L(~flag);
- classifyresult= KNN2(CtrainL,k,knnIndex{j}(:,inmodel));
- if(CtestL~=classifyresult)
- error=error+1;
- end
- flag(j) = 1;
- end
- error=error/train_length;
- fs = sum(inmodel);
- end% A7 j8 B1 E, {
, d! X* Z$ E. l3 \7 `
& }8 L- g* i/ y* u9 N% gKNN2.m
8 ~4 C1 m$ I; V) G- D- A7 N5 b, k( I) I# c
- function relustLabel = KNN2(labels,k,diffMat)
- distanceMat = sqrt(sum(diffMat,2));
- [B , IX] = sort(distanceMat,'ascend');
- len = min(k,length(B));
- relustLabel = mode(labels(IX(1:len)));
- end
$ T% O. z; _% ~0 L
! T m7 J) _3 I" {6 Q0 g6 n: M" V0 y% T- W+ j
结果
4 F5 {, R5 s" r3 d
& z6 x- x6 @; z; m$ R1 \: [7 ]
# U5 ]! B) R* Y. g- e# C
# s8 B! d' L2 p+ h3 W可以看到FSKNN2+preKNN的时间比FSKNN1要少很多。1 n0 \4 m" I* l A
|
|