分段聚合近似法

分段聚合近似法（英文：Piecewise Aggregate Approximation，PAA）是一種時間序列數據的降維方法，最早由埃蒙·基奧（英語：Eamonn Keogh）（Eamonn Keogh）等人提出，用於建立時間序列索引^[1]。相比於離散傅里葉變換、離散小波變換、奇異值分解等降維方法，分段聚合近似法操作比較簡便，適用於更多距離度量，例如加權歐氏距離。並且分段聚合近似法還適用於索引長度和查詢長度不同的情況^[2]。如今分段聚合近似法已經成為一種廣泛應用的時間序列處理方法^[3]^[4]。

定義[編輯]

設時間序列 $X=x_{1},...,x_{n}$ 的長度為 $n$ 。將其用一條長度為 $N$ 的向量 ${\overline {X}}$ 表示， ${\overline {X}}$ 第 $i$ 個元素的定義為^[5]： ${\overline {X}}_{i}={\frac {N}{n}}\sum _{j={\frac {n}{N}}(i-1)+1}^{{\frac {n}{N}}i}x_{j}$

簡言之，為了將原始時間序列從 $n$ 維降低到 $N$ 維，需要將原始數據分割成 $N$ 個等長的分段，在每個分段內計算均值就可以得到降維後的數據表示。

當 $N=n$ 時，分段聚合近似法得到的向量就是原始時間序列本身，當 $N=1$ 時，分段聚合近似法得到的得到值就是原始時間序列的均值^[6]。

索引建立方法[編輯]

用分段聚合近似法建立用於子序列匹配的索引的時間複雜度為 $O(nm)$ 。因為對於大約 $m$ 個「窗口」，每個分段都要用上述公式計算 $N$ 次，並且上述公式要對長度為 ${\frac {n}{N}}$ 的部分求和。埃蒙·基奧（Eamonn Keogh）提出了一種快速計算的方法，可以將時間複雜度降低到 $O(Nm)$ ：每次計算時只要從上次的結果減去上一段離開「窗口」的數據點的部分，加上新進入「窗口」的數據點的部分即可。在第 $j$ 個「窗口」內的第 $i$ 個值可以通過以下公式更新^[7]： ${\overline {x}}_{ji}={\overline {x}}_{j-1i}-{\frac {N}{n}}x_{{\frac {n}{N}}(i-1)+1}+{\frac {N}{n}}x_{{\frac {n}{N}}i+1}$

應用領域[編輯]

作為一種時間序列降維方法，分段聚合近似法得到了廣泛的應用，是一些時間序列的低維表示方法的前期處理步驟之一^[8]。在使用分段聚合近似法建立索引後，為了進行各種查詢，要使用某種距離 $D_{PAA}$ 進行相似度（英語：Similarity measure）度量。為了避免假陰性情況的出現，所使用的距離需要滿足以下特徵^[9]： $D_{PAA}({\overline {X}},{\overline {Y}})\leq D(X,Y)$ 如果 $D_{PAA}$ 的定義為： $D_{PAA}({\overline {X}},{\overline {Y}}):={\sqrt {\frac {n}{N}}}{\sqrt {\sum _{i=1}^{N}({\overline {x}}_{i}-{\overline {y}}_{i})^{2}}}$ ，則滿足上述條件^[10]^[11]。一些時間序列的檢索方法，例如K-NN的算法中，會使用具有以上特徵的距離，根據索引進行初步篩選^[12]。

參考資料[編輯]

^ Tak-chung Fu. A review on time series data mining. Engineering Applications of Artificial Intelligence. 2011, 24 (1): 164–181 [2019-06-05]. （原始內容存檔於2019-06-05）（英語）.
^ 原繼東; 王志海. 时间序列的表示与分类算法综述. 計算機科學. 2015, 42 (3) [2019-06-05]. （原始內容存檔於2019-06-05）.
^ Bakar, Azuraliza Abu; Almahdi Mohammed Ahmed, and Abdul Razak Hamdan. Discretization of time series dataset using relative frequency and K-nearest neighbor approach. International Conference on Advanced Data Mining and Applications. Berlin, Heidelberg: Springer: 193–201. 2010-11 [2019-06-05]. （原始內容存檔於2019-06-05）（英語）.
^ Kulahcioglu, Burcu, Serhan Ozdemir, and Bora Kumova. Application of symbolic Piecewise Aggregate Approximation (PAA) analysis to ECG signals (PDF). 17th IASTED International conference on applied simulation and modelling. 2008-01 [2019-06-05] （英語）.
^ 李海林; 郭崇慧; 楊麗彬. 基于分段聚合时间弯曲距离的时间序列挖掘. 山東大學學報 (工學版). 2011, 41 (5): 57–62 [2019-06-05]. （原始內容存檔於2019-06-05）.
^ Vignesh Krishnamoorthy. Piecewise Aggregate Approximation. vigne.sh. 2018-02 [2019-06-05]. （原始內容存檔於2020-10-21）（英語）.
^ Keogh, Eamonn; Kaushik Chakrabarti; Michael Pazzani; Sharad Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems. 2001-08, 3 (3): 263–286 [2019-06-05]. （原始內容存檔於2016-05-02）（英語）.
^ 时间序列数据的一种符号化表示方法. Google Patent. 2016-11-09 [2019-06-04].
^ Chonghui Guo; Hailin Li, Donghua Pan. An Improved Piecewise Aggregate Approximation Based on Statistical Features for Time Series Mining. International conference on knowledge science, engineering and management. An improved piecewise aggregate approximation based on statistical features for time series mining: 234–244. 2010-09-01. （原始內容存檔於2018-06-17）使用|archiveurl=需要含有|url= (幫助) （英語）.
^ Fuad, Muhammad Marwan Muhammad. Using Differential Evolution to Set Weights to Segments with Different Information Content in the Piecewise Aggregate Approximation. KES. 2012: 440–449 [2019-06-05]. （原始內容存檔於2015-10-31）（英語）.
^ Jessica Lin; Eamonn Keogh; Li Wei; Stefano Lonardi. Experiencing SAX: a novel symbolic representation of time series. Data Mining and knowledge discovery: 101–144. [2019-06-05]. （原始內容存檔於2020-02-10）（英語）.
^ Keogh, Eamonn; Chotirat Ann Ratanamahatana. Exact indexing of dynamic time warping. Knowledge and information systems. 2005, 7 (3): 358–386 [2019-06-04]. （原始內容存檔於2019-02-26）（英語）.

外部連結[編輯]

[Fu_2011-1] Tak-chung Fu. A review on time series data mining. Engineering Applications of Artificial Intelligence. 2011, 24 (1): 164–181 [2019-06-05]. （原始內容存檔於2019-06-05）（英語）.

[原继东_综述-2] 原繼東; 王志海. 时间序列的表示与分类算法综述. 計算機科學. 2015, 42 (3) [2019-06-05]. （原始內容存檔於2019-06-05）.

[Baker_2010-3] Bakar, Azuraliza Abu; Almahdi Mohammed Ahmed, and Abdul Razak Hamdan. Discretization of time series dataset using relative frequency and K-nearest neighbor approach. International Conference on Advanced Data Mining and Applications. Berlin, Heidelberg: Springer: 193–201. 2010-11 [2019-06-05]. （原始內容存檔於2019-06-05）（英語）.

[Kulahcioglu_2008-4] Kulahcioglu, Burcu, Serhan Ozdemir, and Bora Kumova. Application of symbolic Piecewise Aggregate Approximation (PAA) analysis to ECG signals (PDF). 17th IASTED International conference on applied simulation and modelling. 2008-01 [2019-06-05] （英語）.

[李海林_2011-5] 李海林; 郭崇慧; 楊麗彬. 基于分段聚合时间弯曲距离的时间序列挖掘. 山東大學學報 (工學版). 2011, 41 (5): 57–62 [2019-06-05]. （原始內容存檔於2019-06-05）.

[Vignesh_Krishnamoorthy_PAA-6] Vignesh Krishnamoorthy. Piecewise Aggregate Approximation. vigne.sh. 2018-02 [2019-06-05]. （原始內容存檔於2020-10-21）（英語）.

[Keogh_2001-7] Keogh, Eamonn; Kaushik Chakrabarti; Michael Pazzani; Sharad Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems. 2001-08, 3 (3): 263–286 [2019-06-05]. （原始內容存檔於2016-05-02）（英語）.

[时序专利柴毅-8] 时间序列数据的一种符号化表示方法. Google Patent. 2016-11-09 [2019-06-04].

[Guo_2010-9] Chonghui Guo; Hailin Li, Donghua Pan. An Improved Piecewise Aggregate Approximation Based on Statistical Features for Time Series Mining. International conference on knowledge science, engineering and management. An improved piecewise aggregate approximation based on statistical features for time series mining: 234–244. 2010-09-01. （原始內容存檔於2018-06-17）使用|archiveurl=需要含有|url= (幫助) （英語）.

[MMM_Fuad_2012-10] Fuad, Muhammad Marwan Muhammad. Using Differential Evolution to Set Weights to Segments with Different Information Content in the Piecewise Aggregate Approximation. KES. 2012: 440–449 [2019-06-05]. （原始內容存檔於2015-10-31）（英語）.

[Lin_2007-11] Jessica Lin; Eamonn Keogh; Li Wei; Stefano Lonardi. Experiencing SAX: a novel symbolic representation of time series. Data Mining and knowledge discovery: 101–144. [2019-06-05]. （原始內容存檔於2020-02-10）（英語）.

[Keogh_2005-12] Keogh, Eamonn; Chotirat Ann Ratanamahatana. Exact indexing of dynamic time warping. Knowledge and information systems. 2005, 7 (3): 358–386 [2019-06-04]. （原始內容存檔於2019-02-26）（英語）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]