|
|
EDA365欢迎您登录!
您需要 登录 才可以下载或查看,没有帐号?注册
x
一、前言
% [9 I4 I$ B- P& R: Z1 j 支持向量数据描述(Support Vector Data Description,SVDD)是一种单值分类算法,能够实现目标样本和非目标样本的区分,算法的具体描述可以参考以下文献:# ` \5 K& k' b' P( ~1 c
(1)Tax D M J, Duin R P W. Support vector domain description[J]. Pattern recognition letters, 1999, 20(11-13): 1191-1199.# D1 w& n& P" E
(2)Tax D M J, Duin R P W. Support vector data description[J]. Machine learning, 2004, 54(1): 45-66.0 {, m: C% e1 C2 P3 ]1 ]1 }
' E. h2 S7 ~' O% }) s6 I+ g! G 台湾大学林智仁 (Lin Chih-Jen) 教授等开发设计的 libsvm 工具箱提供了SVDD算法的MATLAB接口,其中两个关键参数 c 和 g 直接影响SVDD的单值分类结果。笔者在此基础上,通过引入鲸鱼优化算法(Whale Optimization Algorithm,WOA),实现对 libsvm 工具箱中的SVDD算法的参数优化。6 g$ E7 w, b: r; G8 r
WOA的具体描述可以参考以下文献:
4 K/ L3 f, J9 r8 W(1)Mirjalili S, Lewis A. The whale optimization algorithm[J]. Advances in engineering software, 2016, 95: 51-67.
* @0 ]' D. m7 a9 J9 u7 u- g, ~0 e: \9 Z1 f6 I
- O, E# A- z- ]5 J5 H该算法的提出者已经把代码开源在mathworks。
- M2 b; b6 k. f
6 A! K* X& B) g4 n5 Y. W$ }: D2 Z 注:(1)笔者已把 libsvm工具箱的svmtrain和svmpredict函数的名字分别改为libsvmtrain和libsvmpredict。$ R5 F. \5 X: C ?- a1 |
(2)WOA算法和其他群智能优化算法一样,容易陷入局部最优,若寻优结果出现异常,可以尝试多运行几次。
& p) n* I( o0 ^$ J- d: F9 V6 ]
% f, |, z2 i; q$ I) x2 E* e2 g8 t, F二、例子1 (libsvm 工具箱提供的heart_scale data)
" N' I! r }2 N# k9 G/ P9 ^2 x% M, d$ e1 V+ w" \
1. 数据说明& S" X: u, Z' G6 v$ h4 L
该数据集共有13个属性,270个样本,包括120个正样本和150个负样本。在该例子中,把正样本作为训练集,标签为1;负样本作为测试集,标签为-1。/ o4 s, x$ H2 H1 s- s. e# z
_' j. r; L7 s: h9 a5 U
2. 主程序代码) k+ M" X( J1 j
3 O; A% g, |( m( g& }& _# f" V
- clc
- clear all
- close all
- addpath(genpath(pwd))
- global traindata trainlabel
- % heart_scale data
- [traindata, testdata, trainlabel, testlabel] = prepareData;
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 20; % Maximum numbef of iterations
- lb = [10^-3,2^-4]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^4]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);: k) c! B9 l' w
# ^6 M2 y5 L0 _" Q3 ?4 X, }# O
0 l% m+ ~/ o4 ^3 V0 Z8 P% k最后一次迭代的结果以及最终的分类结果:
8 Z+ @. U3 k* D. j7 r0 v4 s3 C* a+ ?; X$ f9 } W
- ans =
- 19.0000 0.0667
- Accuracy = 80% (96/120) (classification)
- Accuracy = 66.6667% (80/120) (classification)
- Accuracy = 60% (72/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 53.3333% (64/120) (classification)
- Accuracy = 54.1667% (65/120) (classification)
- Accuracy = 42.5% (51/120) (classification)
- Accuracy = 35% (42/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 35% (42/120) (classification)
- ans =
- 20.0000 0.0667
- Accuracy = 100% (150/150) (classification)
) @1 e3 X' D) w/ ]9 d: ]
1 p' s- g; e9 P* [- W
/ n6 Y2 A4 ? F, V& h- H可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为93.33%,测试集的正确率为100%。' E, y4 q8 @6 R
& Q" C2 v! v! c3 q# s5 h三、例子2 (工业过程数据)
, c4 c( I) i( W: @) g. f! K ?0 b7 ?; N6 V9 C, I+ a& K
1. 数据说明" n* ?- d% _$ U* ]4 j8 N* N, Y
采用某工业过程数据,该数据集共有10个属性,训练集有400个正样本,测试集有80个样本(前40个样本为正样本,后40个样本为负样本)。4 i5 q" g* V& q( {6 x! U, b
; }' h3 w( P" i
2. 主程序代码
4 {0 }- W: i8 _& R. E' Q+ A; J$ k' V% O( {
- clc
- clear all
- addpath(genpath(pwd))
- global traindata trainlabel
- % Industrial process data
- load ('.\data\data_2.mat')
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 30; % Maximum numbef of iterations
- lb = [10^-3,2^-7]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^7]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);
- % Visualize the results
- plotResult(testlabel,predictlabel), S g& Q6 V1 ]5 C! z9 m# O! |
# G% j" z: D2 E
4 \7 }. R$ t- E5 m" _4 g最后一次迭代的结果以及最终的分类结果:
& K3 \5 Q. C4 w1 u8 G) c y" I, Q0 o& V+ }
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- ans =
- 30.0000 0.0025
- Accuracy = 93.75% (75/80) (classification): w# K- t) O% w* j" A$ O
: y% W/ w+ e7 b7 u e9 E+ E
, a$ Y" I) G3 ~/ e/ a, e( j可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为99.75%,测试集的正确率为93.75%。3 R' D: P7 |& \- [7 B4 }
可视化结果如下:4 X& l9 P( S m7 ?6 _
* z6 g# \) z& m+ H3 w! G
7 X* I/ F) J0 n8 e3 U" s
7 r. a. c' I& A
5 {) k! v! n8 h, ?% Z6 @) Q3 N
1 [) K! W3 D% Q) q |
|