Theory and Method
CHENG Changxiu, SONG Changqing, WU Xiaojing, SHEN Shi, GAO Peichao, YE Sijing
With the improvement of geographic data acquisition capabilities, the volume of geographic data has been growing exponentially, and the data types as well as characteristics have become more diverse. The effective identification and classification of data has become the key to understand spatio-temporal patterns, evolutionary processes, and driving mechanisms of geographic phenomena. However, traditional clustering methods are facing some challenges, such as large amount, high-dimensionality and poor-quality of the data to be dealt with. Therefore, it is necessary to improve clustering methods. This paper first describes the transformation from one-way clustering to tri-clustering. One-way clustering methods perform the clustering analysis along with the samples or the attributes. They played an important role in previous studies, but ignored local features that are very similar. Co-clustering methods perform the submatrix partitioning scheme based on location similarity of elements within the data matrix. They avoid shortages of one-way clustering by realizing the clustering from both rows and columns, making similar elements into the same submatrix and dissimilar ones into different ones. However, they cannot satisfy multiple directions interpretations of geographical research since they do not support 3D panel data body. Then, we develop a new tri-clustering method, presents the workflow of using tri-clustering to spatio-temporal patterns' studies, and summarizes how to construct the 3D data matrix for clustering according to different aspects of 'space-time-scale-attribute' involved in the analysis. Finally, we show some practices of tri-cluster. The results show that: (1) Tri-clustering is an effective method to identify the spatio-temporal differentiation of geographic data in the era of big data by solving problems, i.e. data of high dimensionality and low quality. (2) Tri-clustering is universal in the algorithmic level when facing different geographic topics, but the differences rely on the 3D data matrices constructed according to different aspects of "space-time-scale-attribute" involved in the analysis. And, different data matrices are clustered to different results, which answer different topics. (3) Tri-clustering is able to interpret the spatio-temporal differentiation of geographic data in multiple directions, multiple scales, and multiple hierarchies, and thereby reveal the superposition effects of spatio-temporal scales of geographic features. Finally, we emphasize the significance of constructing 3D data matrices based on different geographic topics and expect that tri-clustering methods can enhance the ability to analyze geographic data with multiple spatial scales and attributes in the future.