|
|
EDA365欢迎您登录!
您需要 登录 才可以下载或查看,没有帐号?注册
x
一、前言# d t* R2 K- `0 E) i
支持向量数据描述(Support Vector Data Description,SVDD)是一种单值分类算法,能够实现目标样本和非目标样本的区分,算法的具体描述可以参考以下文献:3 Y' h; R# V$ \2 \5 Q4 {4 V4 c/ C) o
(1)Tax D M J, Duin R P W. Support vector domain description[J]. Pattern recognition letters, 1999, 20(11-13): 1191-1199.
6 B* H5 B9 x% p/ O8 V2 E(2)Tax D M J, Duin R P W. Support vector data description[J]. Machine learning, 2004, 54(1): 45-66.
3 ], L" N' A/ |5 W; z4 y
) w3 S7 M( X0 N 台湾大学林智仁 (Lin Chih-Jen) 教授等开发设计的 libsvm 工具箱提供了SVDD算法的MATLAB接口,其中两个关键参数 c 和 g 直接影响SVDD的单值分类结果。笔者在此基础上,通过引入鲸鱼优化算法(Whale Optimization Algorithm,WOA),实现对 libsvm 工具箱中的SVDD算法的参数优化。, @; q" v+ j+ F) K$ J
WOA的具体描述可以参考以下文献:
5 k" r5 O# E( V2 `# O; d3 a$ Z(1)Mirjalili S, Lewis A. The whale optimization algorithm[J]. Advances in engineering software, 2016, 95: 51-67.9 U+ W+ R; O5 u+ R* w7 S3 M+ H( x
! i+ v. ? ~/ Y! l4 \3 W
3 P! @7 c! I5 \) I" U9 x8 l+ {该算法的提出者已经把代码开源在mathworks。6 ~% Q0 f4 J9 h
' d: p. ]: B; t5 d; `
注:(1)笔者已把 libsvm工具箱的svmtrain和svmpredict函数的名字分别改为libsvmtrain和libsvmpredict。
; G( p8 n7 E# }9 q (2)WOA算法和其他群智能优化算法一样,容易陷入局部最优,若寻优结果出现异常,可以尝试多运行几次。
! T& N1 A( G& c5 u% L; U! ]' i' L( J' t! {
二、例子1 (libsvm 工具箱提供的heart_scale data)
' I; p" j; k( X: F* R; d1 E
/ k9 I; i" \* X+ p2 W+ [1. 数据说明
9 q& ]. c+ J" I( K' I 该数据集共有13个属性,270个样本,包括120个正样本和150个负样本。在该例子中,把正样本作为训练集,标签为1;负样本作为测试集,标签为-1。5 G- e/ Y# F8 h; D( ^4 Y
G3 F% o8 k# i2 ]+ c6 n& \
2. 主程序代码
* ` K0 @4 L J6 j) _+ X, E- W9 V. u! g; d' b8 Z4 {
- clc
- clear all
- close all
- addpath(genpath(pwd))
- global traindata trainlabel
- % heart_scale data
- [traindata, testdata, trainlabel, testlabel] = prepareData;
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 20; % Maximum numbef of iterations
- lb = [10^-3,2^-4]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^4]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);: |; I$ |% L- S0 J n% ^6 W5 V
- B/ v% C4 G" w8 k' I& ] E% V# _ ^0 E! K- s+ q2 z
最后一次迭代的结果以及最终的分类结果:5 J5 [4 y' ~, w5 n# \6 v
2 p: k$ k3 b" c+ K' D- ^
- ans =
- 19.0000 0.0667
- Accuracy = 80% (96/120) (classification)
- Accuracy = 66.6667% (80/120) (classification)
- Accuracy = 60% (72/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 53.3333% (64/120) (classification)
- Accuracy = 54.1667% (65/120) (classification)
- Accuracy = 42.5% (51/120) (classification)
- Accuracy = 35% (42/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 35% (42/120) (classification)
- ans =
- 20.0000 0.0667
- Accuracy = 100% (150/150) (classification)9 i- Y+ E3 \. K0 b
9 l+ W) N H/ [0 T/ ]% d8 o
6 O. f! ^ M2 Q" B& J6 M* h
可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为93.33%,测试集的正确率为100%。
: F8 l) e1 Z1 l
) |% u- K, \5 x% M8 M三、例子2 (工业过程数据)
5 e+ F; D2 x! a; D1 X# t: w: M. t0 w& c; A4 z$ t/ M! `$ u2 b3 `
1. 数据说明
. D% h) @, \5 j( z 采用某工业过程数据,该数据集共有10个属性,训练集有400个正样本,测试集有80个样本(前40个样本为正样本,后40个样本为负样本)。
+ U" \& J; m! m% e: Q8 k, ^- {0 O1 b' c3 j9 M9 S% E$ K
2. 主程序代码
4 k+ g; V- Q. B/ Y- H, q2 {. | {8 o6 _. H
- clc
- clear all
- addpath(genpath(pwd))
- global traindata trainlabel
- % Industrial process data
- load ('.\data\data_2.mat')
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 30; % Maximum numbef of iterations
- lb = [10^-3,2^-7]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^7]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);
- % Visualize the results
- plotResult(testlabel,predictlabel)
! _; R) A8 K4 c$ f5 I) D 8 F7 \; L9 L5 Z0 Q% m/ }, t1 R4 P
, ^: c4 M9 {$ f/ X0 m O2 Y' T( y2 S最后一次迭代的结果以及最终的分类结果:
0 Q: \+ _! b" n* V: |1 ]& V8 K* s; z- f5 Y% _
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- ans =
- 30.0000 0.0025
- Accuracy = 93.75% (75/80) (classification)5 C. w1 Z; V3 z1 p9 K6 ~
% G( H( D, c5 P! ]2 \, [, F N/ o$ j# |3 Q# ~. a y
可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为99.75%,测试集的正确率为93.75%。
/ {) Q0 A. r* B, ~( \: H可视化结果如下:
0 d5 d8 ~5 \# n" v! \+ p& _$ ?0 E: B5 B4 J3 d4 A
Y3 O: M0 J3 b+ h6 h, w7 E/ m! z% H
! ]9 `6 R% v: }$ U
* x1 b' |+ \3 F% {6 q" Z) g3 w. Z+ P! E! y, L$ n# M
|
|