|
|
EDA365欢迎您登录!
您需要 登录 才可以下载或查看,没有帐号?注册
x
一、前言
( \ j6 k1 C9 J; [ 支持向量数据描述(Support Vector Data Description,SVDD)是一种单值分类算法,能够实现目标样本和非目标样本的区分,算法的具体描述可以参考以下文献:/ i0 M9 H) b4 g( G! ?4 d1 x X
(1)Tax D M J, Duin R P W. Support vector domain description[J]. Pattern recognition letters, 1999, 20(11-13): 1191-1199.
7 i9 K# a5 t5 j(2)Tax D M J, Duin R P W. Support vector data description[J]. Machine learning, 2004, 54(1): 45-66.* U6 ]! d: W( F# |
7 A2 _" y: |' h" U! d2 p3 J* e5 _ 台湾大学林智仁 (Lin Chih-Jen) 教授等开发设计的 libsvm 工具箱提供了SVDD算法的MATLAB接口,其中两个关键参数 c 和 g 直接影响SVDD的单值分类结果。笔者在此基础上,通过引入鲸鱼优化算法(Whale Optimization Algorithm,WOA),实现对 libsvm 工具箱中的SVDD算法的参数优化。
2 P" t" q& o* ~1 R5 ~; nWOA的具体描述可以参考以下文献:
! B* M+ F/ j+ v! l; ^- K(1)Mirjalili S, Lewis A. The whale optimization algorithm[J]. Advances in engineering software, 2016, 95: 51-67.
3 U1 r- h4 \; _% s; V
$ d$ D; ?/ g. R: D0 Q) V& H+ U" b# ~) o* v H
该算法的提出者已经把代码开源在mathworks。
# V5 \' F9 h" w- Z
, W9 `' ~% O6 ]# `" r6 R 注:(1)笔者已把 libsvm工具箱的svmtrain和svmpredict函数的名字分别改为libsvmtrain和libsvmpredict。" B8 |: @9 A% s$ o
(2)WOA算法和其他群智能优化算法一样,容易陷入局部最优,若寻优结果出现异常,可以尝试多运行几次。* e' e) Z+ g) |; O1 ]* ^
4 ~6 Y8 E; \4 z5 L
二、例子1 (libsvm 工具箱提供的heart_scale data)3 } P; h. s! ~" Y; }, r+ F
# R3 m$ }) x; H9 ~% U4 r" q( v, J1. 数据说明
/ m+ L' B6 U8 u: z& ? 该数据集共有13个属性,270个样本,包括120个正样本和150个负样本。在该例子中,把正样本作为训练集,标签为1;负样本作为测试集,标签为-1。0 t' ~! ^' V4 V
3 k1 ?! Q4 h4 ?1 j5 D2. 主程序代码1 `0 H- B6 ?$ P' Z1 H! N3 S! \
4 d- c* C9 O) t4 ~
- clc
- clear all
- close all
- addpath(genpath(pwd))
- global traindata trainlabel
- % heart_scale data
- [traindata, testdata, trainlabel, testlabel] = prepareData;
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 20; % Maximum numbef of iterations
- lb = [10^-3,2^-4]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^4]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);, w$ m8 N) i# r$ _: }
& }# H+ Q9 r& Y, s
' q+ b2 v4 L: B: Y" T6 c4 W
最后一次迭代的结果以及最终的分类结果:
9 C5 y2 g3 c' f8 f3 q0 I3 `
8 K( Y$ }$ f1 V! k* Y8 _- ans =
- 19.0000 0.0667
- Accuracy = 80% (96/120) (classification)
- Accuracy = 66.6667% (80/120) (classification)
- Accuracy = 60% (72/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 53.3333% (64/120) (classification)
- Accuracy = 54.1667% (65/120) (classification)
- Accuracy = 42.5% (51/120) (classification)
- Accuracy = 35% (42/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 35% (42/120) (classification)
- ans =
- 20.0000 0.0667
- Accuracy = 100% (150/150) (classification)
9 Y8 F" t, `$ u) S
) J. V3 @1 @3 a& v6 C4 S$ L3 Q l# _# F" o: C
可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为93.33%,测试集的正确率为100%。
N [& `/ W5 X4 _" \* P
! C L% A+ u6 U9 ]* V8 L三、例子2 (工业过程数据)4 M1 Y: p* k0 i+ b/ @
: w( C8 Q) Y, r! u
1. 数据说明
$ s D' f+ D h3 a; w 采用某工业过程数据,该数据集共有10个属性,训练集有400个正样本,测试集有80个样本(前40个样本为正样本,后40个样本为负样本)。
8 F1 n; ?( l0 t" h8 P' s5 j5 F; O0 I$ L# ]9 O- M6 a
2. 主程序代码# Z- x4 `: I2 k1 }* L3 [: Q
5 z( S0 J; x# ^8 P' X9 s8 W. F
- clc
- clear all
- addpath(genpath(pwd))
- global traindata trainlabel
- % Industrial process data
- load ('.\data\data_2.mat')
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 30; % Maximum numbef of iterations
- lb = [10^-3,2^-7]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^7]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);
- % Visualize the results
- plotResult(testlabel,predictlabel)' y; x) A" b7 m1 y
$ j% x. t* V* X
2 G$ J; z2 {+ t: T最后一次迭代的结果以及最终的分类结果:
/ T5 D. ?3 [) Z" j5 }8 {
2 R2 b" a+ Y* }4 g0 H- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- ans =
- 30.0000 0.0025
- Accuracy = 93.75% (75/80) (classification)
8 D) c6 e3 G) e2 d 2 E+ r$ H- X& V7 z, K( d7 w( G$ T
7 D1 O9 ~; Q6 \9 M& I' e H% s
可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为99.75%,测试集的正确率为93.75%。
# d& |' R2 ^! D" `2 Z可视化结果如下:# v O! L' g, S: g
( w1 c8 b) ~/ c. G& N6 ? 7 G7 v' h8 \% k9 L- M {7 T
; x6 h4 w7 k$ Z. i% |* V
9 B' I9 c3 X8 a, r
( G) `; D. k6 w0 \ U |
|