|
|
EDA365欢迎您登录!
您需要 登录 才可以下载或查看,没有帐号?注册
x
一、前言
# R5 P9 v3 {2 {9 O5 C# Q 支持向量数据描述(Support Vector Data Description,SVDD)是一种单值分类算法,能够实现目标样本和非目标样本的区分,算法的具体描述可以参考以下文献:
! p7 p7 ~* Y! e/ F6 n7 J(1)Tax D M J, Duin R P W. Support vector domain description[J]. Pattern recognition letters, 1999, 20(11-13): 1191-1199.' M" r1 S3 s3 e( C+ ]& o3 _# L
(2)Tax D M J, Duin R P W. Support vector data description[J]. Machine learning, 2004, 54(1): 45-66.5 o& k1 q6 I2 {& _4 f2 Z
+ l# O9 k4 H s$ ^ d$ \ 台湾大学林智仁 (Lin Chih-Jen) 教授等开发设计的 libsvm 工具箱提供了SVDD算法的MATLAB接口,其中两个关键参数 c 和 g 直接影响SVDD的单值分类结果。笔者在此基础上,通过引入鲸鱼优化算法(Whale Optimization Algorithm,WOA),实现对 libsvm 工具箱中的SVDD算法的参数优化。# H, i2 a( O, ^# X, A7 p
WOA的具体描述可以参考以下文献:
+ f [/ R, q2 J7 ^% z, Z1 `(1)Mirjalili S, Lewis A. The whale optimization algorithm[J]. Advances in engineering software, 2016, 95: 51-67.
6 y( j- s- ], o8 t8 N/ x0 `
@8 k, H4 C0 ?) B: D& J2 a a; P7 \+ I
该算法的提出者已经把代码开源在mathworks。, P2 _& ~# q, F- h' Z
/ n6 A* f. i6 ]% Z+ g 注:(1)笔者已把 libsvm工具箱的svmtrain和svmpredict函数的名字分别改为libsvmtrain和libsvmpredict。
( t- F* K3 {2 c2 p9 f e (2)WOA算法和其他群智能优化算法一样,容易陷入局部最优,若寻优结果出现异常,可以尝试多运行几次。/ Z/ I, F2 r2 |. S
, w$ u& B' X! n( O" `, c, t8 N, j二、例子1 (libsvm 工具箱提供的heart_scale data)7 h- y6 L: @* i% s& t, G0 `5 y
. f8 E/ L+ S/ Q; r$ \1. 数据说明
% [; R- @7 `8 Z 该数据集共有13个属性,270个样本,包括120个正样本和150个负样本。在该例子中,把正样本作为训练集,标签为1;负样本作为测试集,标签为-1。
% e. k) Q; n8 @
' b! L7 U* Z& K2. 主程序代码
1 i Q& I, q9 p
- O, i, P. W0 ~- clc
- clear all
- close all
- addpath(genpath(pwd))
- global traindata trainlabel
- % heart_scale data
- [traindata, testdata, trainlabel, testlabel] = prepareData;
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 20; % Maximum numbef of iterations
- lb = [10^-3,2^-4]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^4]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);
1 o0 q7 c% c l) f# C: m1 k' } 5 }( _7 D7 N3 v5 v
) m3 A+ @# q2 E
最后一次迭代的结果以及最终的分类结果:- k) T" T/ n0 i/ x
K2 W9 C" c# ^. C% L9 P T
- ans =
- 19.0000 0.0667
- Accuracy = 80% (96/120) (classification)
- Accuracy = 66.6667% (80/120) (classification)
- Accuracy = 60% (72/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 53.3333% (64/120) (classification)
- Accuracy = 54.1667% (65/120) (classification)
- Accuracy = 42.5% (51/120) (classification)
- Accuracy = 35% (42/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 35% (42/120) (classification)
- ans =
- 20.0000 0.0667
- Accuracy = 100% (150/150) (classification), K; H/ x* B+ E9 j) O9 q8 P
) y" \3 b" P- ~% D; m8 R2 W
$ { C+ [! c) f可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为93.33%,测试集的正确率为100%。( T' O% K: k; a* l3 c, h
# ?1 V% F5 K0 ]9 d( b
三、例子2 (工业过程数据)6 @& |% ~' v: D4 Y) A0 ^, U
8 k7 f9 \* i1 ]4 H1. 数据说明
9 C7 W6 b5 o) M1 Q5 ~6 s | 采用某工业过程数据,该数据集共有10个属性,训练集有400个正样本,测试集有80个样本(前40个样本为正样本,后40个样本为负样本)。
8 Y- z) X$ f; T7 b; O
- e# H3 O8 X! x: J4 e* U' m) [4 ^2. 主程序代码
9 P7 e+ O+ e. D1 f0 `
- p; r4 Y; D3 a& }0 r- clc
- clear all
- addpath(genpath(pwd))
- global traindata trainlabel
- % Industrial process data
- load ('.\data\data_2.mat')
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 30; % Maximum numbef of iterations
- lb = [10^-3,2^-7]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^7]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);
- % Visualize the results
- plotResult(testlabel,predictlabel)
) K, w) y$ ]) | 9 o" k% g/ _2 g7 v
" T' r+ r3 O; c( U m. p最后一次迭代的结果以及最终的分类结果:
`0 G4 R$ r3 a& v$ L
~( k: Y' s. T4 M4 J7 n- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- ans =
- 30.0000 0.0025
- Accuracy = 93.75% (75/80) (classification)9 g% ?5 E4 P* O* p1 l" M/ D
0 d: E7 p" |. t/ S0 x9 Z" q
4 S6 c) b9 G; e- j可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为99.75%,测试集的正确率为93.75%。. {" m* W8 n9 f) d/ b/ a
可视化结果如下:
|. f4 b1 Z9 c) M7 R; _1 c; B4 v) ?/ L3 a7 a3 ~) O
: Q+ }. t6 x8 o: x5 Y! v, }$ Y" B9 L w7 V. Z
9 Y0 K" t* ?6 }7 G* Y% e1 C0 Z# {3 C' ~% @$ {, J7 }
|
|