﻿ 一种加入类间因素的曲线聚类算法
«上一篇
 文章快速检索 高级检索

 智能系统学报  2019, Vol. 14 Issue (2): 362-368  DOI: 10.11992/tis.201709029 0

### 引用本文

XU Tengteng, WANG Rui, HUANG Hengjun. Curve clustering algorithms by adding the differences among clusters[J]. CAAI Transactions on Intelligent Systems, 2019, 14(2): 362-368. DOI: 10.11992/tis.201709029.

### 文章历史

Curve clustering algorithms by adding the differences among clusters
XU Tengteng , WANG Rui , HUANG Hengjun
School of Statistics, Lanzhou University of Finance and Economics, Lanzhou 730020, China
Abstract: With the improvement of accuracy and frequency of data collection, functional data has appeared. Curves' clustering is a fundamental exploratory task in functional data analysis, and To sovave currently curves clustering algorithms available are based on the differences within each cluster, which has resulted in a low distinction among different curves. Therefore, on the base of curve fitting and curve distance, and with constructed objective function, curves clustering algorithms will be put forward with the consideration of cluster differences. Simulated results show that the curve cluster improves clustering accuracy. The example analysis of hourly NO2 concentration (μg/m3) indicates that this kind of curves clustering algorithms has a better distinction among different clusters.
Key words: functional data    differences among clusters    curve clustering    B-spline    distance metric

1 加入类间因素的曲线聚类

1.1 曲线生成

 ${y_{ij}} = {X_i}\left( {{t_{ij}}} \right) + {\varepsilon _{ij}},\quad j = 1,2,\cdots ,m$ (1)

 ${X_i}\left( t \right) = \sum\limits_{k = 1}^\infty {{\alpha _{ik}}{\phi _{ik}}\left( t \right)}$ (2)

 ${X_i}\left( t \right) = \sum\limits_{k = 1}^K {{\alpha _{ik}}{\phi _{ik}}\left( t \right)} {\rm{ = }}{{{\alpha }}_i}^{\rm{T}}{{{\varPhi }}_i}\left( t \right)$ (3)

1) 对不同曲线 ${X_i}\left( t \right)\left( {i = 1,2,\cdots ,n} \right)$ 采用一组相同的基底表述；

2) 基底函数设定为等距节点B-样条基底。有

 ${X_i}\left( t \right){\rm{ = }}\sum\limits_{k = 1}^L {{\alpha _{ik}}{B_{k,M}}\left( t \right)} {\rm{ = }}{{{\alpha }}_i}^{\rm{T}}{{{B}}_M}\left( t \right)$ (4)

1.2 曲线距离

 ${d^2}\left( {i,j} \right) = {\left\| {{X_i} - {X_j}} \right\|^2}$ (5)

 ${X_i}\left( t \right) - {X_j}\left( t \right) = {\left[ {{{{\alpha }}_i} - {{{\alpha }}_j}} \right]^{\rm{T}}}{{{B}}_M}\left( t \right)$ (6)

 ${d^2}\left( {i,j} \right) = {\left[ {{{{\alpha }}_i} - {{{\alpha }}_j}} \right]^{\rm{T}}}{{K}}\left[ {{{{\alpha }}_i} - {{{\alpha }}_j}} \right]$ (7)

 ${{K}}{\rm{ = }}\left[ {\begin{array}{*{20}{c}} {\left\langle {{B_{\rm{1}}},{B_1}} \right\rangle }&{\left\langle {{B_{\rm{1}}},{B_2}} \right\rangle }&{\cdots }&{\left\langle {{B_{\rm{1}}},{B_L}} \right\rangle } \\ {\left\langle {{B_2},{B_1}} \right\rangle }&{\left\langle {{B_2},{B_2}} \right\rangle }&{\cdots }&{\left\langle {{B_2},{B_L}} \right\rangle } \\ {\cdots }&{\cdots }& \ddots &{\cdots } \\ {\left\langle {{B_L},{B_1}} \right\rangle }&{\left\langle {{B_L},{B_2}} \right\rangle }&{\cdots }&{\left\langle {{B_L},{B_L}} \right\rangle } \end{array}} \right]$

 $\begin{array}{c} {d^2}\left( {i,j} \right) = {\left[ {{{{\alpha }}_i} - {{{\alpha }}_j}} \right]^{\rm{T}}}{{K}}\left[ {{{{\alpha }}_i} - {{{\alpha }}_j}} \right]{\rm{ = }} \\ {\left[ {{{{\alpha }}_i} - {{{\alpha }}_j}} \right]^{\rm{T}}}L{L^{\rm{T}}}\left[ {{{{\alpha }}_i} - {{{\alpha }}_j}} \right]{\rm} = \\ {\left[ {{L^{\rm{T}}}{{{\alpha }}_i} - {L^{\rm{T}}}{{{\alpha }}_j}} \right]^{\rm{T}}} \left[ {{L^{\rm{T}}}{{{\alpha }}_i} - {L^{\rm{T}}}{{{\alpha }}_j}} \right]{\rm{ = }} {\left[ {{{{b}}_i} - {{{b}}_j}} \right]^{\rm{T}}}\left[ {{{{b}}_i} - {{{b}}_j}} \right] \end{array}$ (8)

${{{b}}_i}{\rm{ = }}{{L^{\rm T}}}{{{\alpha }}_i}$ 得到 ${{B}} = {{A}}{{{L}}}$ ，其中 ${{A}}{\rm{ = [}}{{{\alpha }}_1}\;{{{\alpha }}_2}\;\cdots \;{{{\alpha }}_{{n_p}}}{{\rm{]}}^{\rm{T}}}$ ${{B}} = [{{{b}}_1}\;{{{b}}_2}\;\cdots \;{{b}} _{{n_p}}]^{\rm{T}}$ ${n_p}$ 表示第 ${G_p}$ 类中的曲线数量，令 $\bar X\left( {{t_0}} \right)$ 表示随机选取的一条曲线作为初始类中心， ${\bar X^{\left( {{G_p}} \right)}}\left( t \right)$ 表示第 ${G_p}$ 类中的类中心。则有 ${\bar X^{\left( {{G_p}} \right)}}\left( t \right) =$ ${n_p}^{ - 1}{{{1}}^{\rm{T}}}{{B}}{{{L}}^{ - {\rm{1}}}}{{{B}}_M}\left( t \right)$

1.3 改进的曲线聚类算法

 $F\left( {{{\varPhi ,U}}} \right){\rm{ = }}\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^K {{u_{ik}}} } \frac{{{{\left\| {{X_i}\left( t \right) - \bar X\left( t \right)} \right\|}^2}}}{{{{\left\| {\bar X\left( t \right) - \bar X\left( {{t_0}} \right)} \right\|}^2}}}$ (9)

 $F\left( {{{\varPhi ,U}}} \right){\rm{ = }}\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^K {{u_{ik}}} } \frac{{{{\left\| {{{{\alpha }}_i}^{\rm{T}}{{{B}}_M}\left( t \right) - \bar X\left( t \right)} \right\|}^2}}}{{{{\left\| {\bar X\left( t \right) - \bar X\left( {{t_0}} \right)} \right\|}^2}}}$ (10)

 \begin{aligned} {\rm{ }}F\left( {{{\varPhi }},{{U}}} \right){\rm{ = }}\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^K {{u_{ik}}} } \frac{{{{\left[ {{{{b}}_i} - {{{b}}_ * }} \right]}^{\rm{T}}}\left[ {{{{b}}_i} - {{{b}}_ * }} \right]}}{{{{\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}^{\rm{T}}}\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}} \end{aligned} (11)

1) 固定 ${{\varPhi }}{\rm{ = }}{{\hat \varPhi }}$ ，求解函数 $F\left( {{{\hat \varPhi }},{{U}}} \right)$

2) 固定 ${{U}}{\rm{ = }}{{\hat U}}$ ，求解函数 $F\left( {{{\varPhi }},\hat { U}} \right)$

 ${u_{ik}}{\rm{ = }}\left\{ \begin{array}{l} 1,\quad \displaystyle\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^K {\frac{{{{\left[ {{{{b}}_i} - {{{b}}_ * }} \right]}^{\rm{T}}}\left[ {{{{b}}_i} - {{{b}}_ * }} \right]}}{{{{\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}^{\rm{T}}}\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}}} \leqslant {{D}}} \\ 0,{\text{其他}} \end{array} \right.$ (12)

$\displaystyle\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^K {{u_{ik}}} } {\left( {{{\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}^{\rm{T}}}\left[ {{{{b}}_ * } - {{{b}}_0}} \right]} \right)^{\rm{2}}}$ = 0时，令 ${{{b}}_ * }{\rm{ = }}{{{b}}_0}$

$\displaystyle\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^K {{u_{ik}}} } {\left( {{{\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}^{\rm{T}}}\left[ {{{{b}}_ * } - {{{b}}_0}} \right]} \right)^{\rm{2}}} \ne 0$ 时，目标函数式(11)关于b*，求导

 $\begin{array}{l} \displaystyle\frac{{\partial F\left( {{{\varPhi }},{{\hat U}}} \right)}}{{\partial {{{b}}_ * }}} = \displaystyle\sum\limits_{i = 1}^n {\displaystyle\sum\limits_{k = 1}^K {{u_{ik}}} } \frac{{\partial \left( {\displaystyle\frac{{{{\left[ {{{{b}}_i} - {{{b}}_ * }} \right]}^{\rm{T}}}\left[ {{{{b}}_i} - {{{b}}_ * }} \right]}}{{{{\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}^{\rm{T}}}\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}}} \right)}}{{\partial {{{b}}_ * }}}{\rm{ = }}\\ \qquad\qquad\qquad \displaystyle\sum\limits_{i = 1}^n {\displaystyle\sum\limits_{k = 1}^K {{u_{ik}}} } \displaystyle\frac{{{\rm{2}}\left[ {{{{b}}_i} - {{{b}}_ * }} \right]}}{{{{\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}^{\rm{T}}}\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}}{\rm} -\\ \qquad\qquad\qquad\displaystyle\frac{{{\rm{2}}\left[ {{{{b}}_ * } - {{{b}}_0}} \right]{{\left[ {{{{b}}_i} - {{{b}}_ * }} \right]}^{\rm{T}}}\left[ {{{{b}}_i} - {{{b}}_ * }} \right]}}{{{{\left( {{{\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}^{\rm{T}}}\left[ {{{{b}}_ * } - {{{b}}_0}} \right]} \right)}^{\rm{2}}}}}{\rm{ = 0}} \end{array}$

 $\begin{array}{l} \quad\;\; \displaystyle\sum\limits_{i = 1}^n {\displaystyle\sum\limits_{k = 1}^K {{u_{ik}}} } \displaystyle\frac{{{\rm{2}}\left[ {{{{b}}_i} - {{{b}}_ * }} \right]}}{{{{\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}^{\rm{T}}}\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}}{\rm{ = }}\\ \quad\;\; \displaystyle\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^K {{u_{ik}}} } \displaystyle\frac{{{\rm{2[}}{{{b}}_ * } - {{{b}}_0}]{{\left[ {{{{b}}_i} - {{{b}}_ * }} \right]}^{\rm{T}}}\left[ {{{{b}}_i} - {{{b}}_ * }} \right]}}{{{{\left( {{{\left[ {{{{b}}_ * } - {{{b}}_0}} \right]}^{\rm{T}}}\left[ {{{{b}}_ * } - {{{b}}_0}} \right]} \right)}^{\rm{2}}}}} \end{array}$

 ${{{b}}_ * }{\rm{ = }}\displaystyle\frac{{\displaystyle\sum\limits_{i = 1}^n {\displaystyle\sum\limits_{k = 1}^K {{u_{ik}}} } {{\left[ {{{{b}}_i} - {{{b}}_0}} \right]}^{\rm{T}}}{{{b}}_i}}}{{\displaystyle\sum\limits_{i = 1}^n {\displaystyle\sum\limits_{k = 1}^K {{u_{ik}}} } {{\left[ {{{{b}}_i} - {{\bf{b}}_0}} \right]}^{\rm{T}}}}}$

 ${{{b}}_ * }{\rm{ = }}\left\{ \begin{gathered} {{{b}}_{\rm{0}}}, \;\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^K {{u_{ik}}} } \left[ {\left| {{{{b}}_ * }{{{b}}_ * }^{\rm{T}}} \right|{{\left( {{{{b}}_ * }{{{b}}_ * }^{\rm{T}}} \right)}^{ - 1}}{{{b}}_ * } - {{{b}}_0}} \right] = 0 \hfill \\ \frac{{\displaystyle\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^K {{u_{ik}}} } {{\left[ {{{{b}}_i} - {{{b}}_0}} \right]}^{\rm{T}}}{{{b}}_i}}}{{\displaystyle\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^K {{u_{ik}}} } {{\left[ {{{{b}}_i} - {{{b}}_0}} \right]}^{\rm{T}}}}},\;\;{\text{其他}} \hfill \\ \end{gathered} \right.$ (13)

Input: $X = \left\{ {{X_1}},{{X_2}},\cdots {{X_n}} \right\},k$

Initialize: Randomly choose an initial ${{{b}}_0} =$ ${{{b}}_1},{{{b}}_2},\cdots ,$ bk

Repeat

Fixed ${{\varPhi }}$ , use eq. (12) to solve ${{U}}$

Fixed ${{U}}$ , use eq. (13) to solve ${{\varPhi }}$

Until convergence.

2 算法效果模拟验证与分析

 Download: 图 1 模拟数据曲线聚类对比 Fig. 1 Comparison with simulated data of curve's clustering

3 NO2小时浓度曲线聚类效果分析

 Download: 图 2 曲线聚类类中心对比 Fig. 2 Comparison with curve cluster's center generated by different algorithms

 Download: 图 3 NO2小时浓度数据曲线聚类对比 Fig. 3 Comparison with curve clustering of NO2 concentration