|
EDA365欢迎您登录!
您需要 登录 才可以下载或查看,没有帐号?注册
x
一、前言
: u! ~" ~0 j$ c& f 支持向量数据描述(Support Vector Data Description,SVDD)是一种单值分类算法,能够实现目标样本和非目标样本的区分,算法的具体描述可以参考以下文献:
& x8 K: k9 R1 {% D( U1 Y. x(1)Tax D M J, Duin R P W. Support vector domain description[J]. Pattern recognition letters, 1999, 20(11-13): 1191-1199.
; K4 M. U) q+ N4 c% r& _(2)Tax D M J, Duin R P W. Support vector data description[J]. Machine learning, 2004, 54(1): 45-66. m" [# T: p" i6 i$ w4 {9 d5 S
0 I" s3 o# W4 ~+ C/ L2 q1 b
台湾大学林智仁 (Lin Chih-Jen) 教授等开发设计的 libsvm 工具箱提供了SVDD算法的MATLAB接口,其中两个关键参数 c 和 g 直接影响SVDD的单值分类结果。笔者在此基础上,通过引入鲸鱼优化算法(Whale Optimization Algorithm,WOA),实现对 libsvm 工具箱中的SVDD算法的参数优化。
) Y/ u4 l5 `. W( X6 `" GWOA的具体描述可以参考以下文献:
7 s, D K0 b/ P( B- `(1)Mirjalili S, Lewis A. The whale optimization algorithm[J]. Advances in engineering software, 2016, 95: 51-67.
1 N! f. _% W* u* W, p& ]- [' |# _8 q' w
/ z h5 p. M* Q& b
该算法的提出者已经把代码开源在mathworks。
" ~+ q6 I/ f# }" _6 C9 Z* |7 x; w5 _4 j
8 A* t- Y ~; n- V 注:(1)笔者已把 libsvm工具箱的svmtrain和svmpredict函数的名字分别改为libsvmtrain和libsvmpredict。
9 H2 Q! S9 @5 ` (2)WOA算法和其他群智能优化算法一样,容易陷入局部最优,若寻优结果出现异常,可以尝试多运行几次。
# `# T F6 h; i6 `
- G2 K3 [% r( }- u0 {8 a( C二、例子1 (libsvm 工具箱提供的heart_scale data)3 U, `) _6 J) S2 z1 W# A. @* Y
6 y& F, v6 ^# }0 k7 Q
1. 数据说明
8 r" |# v/ _. D7 ]1 `& C8 H8 Y 该数据集共有13个属性,270个样本,包括120个正样本和150个负样本。在该例子中,把正样本作为训练集,标签为1;负样本作为测试集,标签为-1。
9 v. Q; f, C4 {( ]! g9 X: Y* x
7 M8 q- Z5 f6 E& q2. 主程序代码
& A/ B/ ?* x. k1 v/ u& f$ V$ a3 F
. g5 V" u3 Z, {# ~. r" x- clc
- clear all
- close all
- addpath(genpath(pwd))
- global traindata trainlabel
- % heart_scale data
- [traindata, testdata, trainlabel, testlabel] = prepareData;
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 20; % Maximum numbef of iterations
- lb = [10^-3,2^-4]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^4]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);1 u1 B$ P1 P" B( D; ?
1 g9 U* }/ G; v A: E$ y% I; O w2 X* ~8 i8 y# n
最后一次迭代的结果以及最终的分类结果:, R# G) E) x# X
8 m% Y& a0 K) p7 l: ]7 r6 x, H
- ans =
- 19.0000 0.0667
- Accuracy = 80% (96/120) (classification)
- Accuracy = 66.6667% (80/120) (classification)
- Accuracy = 60% (72/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 53.3333% (64/120) (classification)
- Accuracy = 54.1667% (65/120) (classification)
- Accuracy = 42.5% (51/120) (classification)
- Accuracy = 35% (42/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 35% (42/120) (classification)
- ans =
- 20.0000 0.0667
- Accuracy = 100% (150/150) (classification)
: z; h# v7 X1 I; n/ i
8 A+ y5 \. p4 i* k0 w1 Z+ t
' ]; @! u, d: T) W, g可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为93.33%,测试集的正确率为100%。
3 b" ?$ F) k4 ? H' A! O1 H- T& N- T( T9 S3 q
三、例子2 (工业过程数据)
; p x. N/ ~3 h# E, t& Q: I; c" x
1. 数据说明* E* G- h8 I) a+ M( o
采用某工业过程数据,该数据集共有10个属性,训练集有400个正样本,测试集有80个样本(前40个样本为正样本,后40个样本为负样本)。
, l3 T0 { m/ W4 S* _& Z. F' o. V- ?7 _8 q/ `( B. I8 j
2. 主程序代码) v2 h" ^5 L, L/ O- }
9 Y5 ]. s5 N4 F
- clc
- clear all
- addpath(genpath(pwd))
- global traindata trainlabel
- % Industrial process data
- load ('.\data\data_2.mat')
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 30; % Maximum numbef of iterations
- lb = [10^-3,2^-7]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^7]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);
- % Visualize the results
- plotResult(testlabel,predictlabel)
" F, L) |& V' j: T0 a" ]& J, `" T \& ~0 J4 N* U8 t
3 g5 \$ f( \; M最后一次迭代的结果以及最终的分类结果:) [( n- @: d- p9 R* s: o
3 R# k( g! r3 X
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- ans =
- 30.0000 0.0025
- Accuracy = 93.75% (75/80) (classification)
. A- P1 k) g6 ]# C6 S" ^
( y9 H: Q# X; i- @( Z% I) s' Q- @
可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为99.75%,测试集的正确率为93.75%。
+ y+ b* c+ r6 j- x$ C4 a9 N可视化结果如下:/ N- n: E7 {. H2 i# W
6 V" F. e( X' f* v
" u5 n" P8 ?9 @ y5 Q
+ N, L. a8 ]4 V# D) y+ I A
$ ~( X5 n& w9 ? x
( N5 y( O, t& Q* _ |
|