地理学报 ›› 2020, Vol. 75 ›› Issue (5): 904-916.doi: 10.11821/dlxb202005002

• 理论与方法 • 上一篇    下一篇

地理时空三向聚类分析方法的构建与实践

程昌秀1,2,3, 宋长青1,2(), 吴晓静1,2, 沈石1,2, 高培超1,2, 叶思菁1,2   

  1. 1.北京师范大学地表过程与资源生态国家重点实验室,北京 100875
    2.北京师范大学地理科学学部,北京 100875
    3.国家青藏高原科学数据中心,北京 100101
  • 收稿日期:2020-02-06 修回日期:2020-04-22 出版日期:2020-05-25 发布日期:2020-07-25
  • 通讯作者: 宋长青 E-mail:songcq@bnu.edu.cn
  • 作者简介:程昌秀(1973-), 女, 新疆人, 教授, 主要从事地理时空数据分析等研究。E-mail: chengcx@bnu.edu.cn
  • 基金资助:
    国家重点研发计划(2019YFA0606901);中国科学院战略性先导科技专项(XDA23100303)

Tri-clustering: Construction and practice of space-time integrated analysis tool

CHENG Changxiu1,2,3, SONG Changqing1,2(), WU Xiaojing1,2, SHEN Shi1,2, GAO Peichao1,2, YE Sijing1,2   

  1. 1.State Key Laboratory of Earth Surface Processes and Resource Ecology, Beijing Normal University,Beijing 100875, China
    2.Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
    3.National Tibetan Plateau Data Center, Beijing 100101, China
  • Received:2020-02-06 Revised:2020-04-22 Online:2020-05-25 Published:2020-07-25
  • Contact: SONG Changqing E-mail:songcq@bnu.edu.cn
  • Supported by:
    National Key R&D Program of China(2019YFA0606901);Strategic Priority Research Program of the Chinese Academy of Sciences(XDA23100303)

摘要:

随着地理数据获取能力的不断提升,地理数据体量呈指数增长,数据种类、数据性质更加多元化。对数据的有效甄别和归类成为理解地理现象时空特征、演化过程和行为机制的关键。传统聚类方法面临数据体量大、维数高、质量差的挑战,加之对地理空间与时间关联分析的需求,对聚类方法改进和提升研究的要求越来越迫切。本文介绍了从单向到三向聚类构建思路的变革。单向聚类是仅在样本或属性方向上进行聚类,易忽视非常相似的局部特征、易犯“横看成岭侧成峰”的错误。双向聚类是基于数据矩阵内元素值的相似性,形成一个子矩阵分割方案,使子矩阵内元素相似度尽可能高,子矩阵间元素相似度尽可能低,从而实现行列两方向的同时聚类,避免了单向聚类的不足。鉴于双向聚类难以满足地理研究超出双向的解译需求,本文提出并研发了一个全新的三向聚类方法,给出了运用该方法开展地理时空格局过程探测的流程,总结了如何根据研究涉及的“空间—时间—尺度—属性”构建三维数据体;最后,展示了三向聚类的地理实践案例。结果表明:① 三向聚类是一种大数据时代探测地理数据时空分异规律的有效方法,可以解决数据维度高、质量低等问题;② 面对不同的地理问题,三向聚类在算法层面上是通用的,不同之处仅在于:根据不同问题涉及的空间、时间、尺度、属性的不同,构建不同的数据体;不同数据体聚类得到的不同结果回答不同的地理问题;③ 三向聚类可以实现地理数据的时空分异规律多方向、多尺度、多层次的联合解译,揭示地理特征时空尺度叠加效应。最后,论文强调根据地理问题组织数据的重要性,期待未来能够提升三向聚类在多空间尺度、多属性方面的地理研究实践。

关键词: 三向聚类, 空间—时间—尺度—属性, 联合解译, 时空局部相似性, 时空分异

Abstract:

With the improvement of geographic data acquisition capabilities, the volume of geographic data has been growing exponentially, and the data types as well as characteristics have become more diverse. The effective identification and classification of data has become the key to understand spatio-temporal patterns, evolutionary processes, and driving mechanisms of geographic phenomena. However, traditional clustering methods are facing some challenges, such as large amount, high-dimensionality and poor-quality of the data to be dealt with. Therefore, it is necessary to improve clustering methods. This paper first describes the transformation from one-way clustering to tri-clustering. One-way clustering methods perform the clustering analysis along with the samples or the attributes. They played an important role in previous studies, but ignored local features that are very similar. Co-clustering methods perform the submatrix partitioning scheme based on location similarity of elements within the data matrix. They avoid shortages of one-way clustering by realizing the clustering from both rows and columns, making similar elements into the same submatrix and dissimilar ones into different ones. However, they cannot satisfy multiple directions interpretations of geographical research since they do not support 3D panel data body. Then, we develop a new tri-clustering method, presents the workflow of using tri-clustering to spatio-temporal patterns' studies, and summarizes how to construct the 3D data matrix for clustering according to different aspects of 'space-time-scale-attribute' involved in the analysis. Finally, we show some practices of tri-cluster. The results show that: (1) Tri-clustering is an effective method to identify the spatio-temporal differentiation of geographic data in the era of big data by solving problems, i.e. data of high dimensionality and low quality. (2) Tri-clustering is universal in the algorithmic level when facing different geographic topics, but the differences rely on the 3D data matrices constructed according to different aspects of "space-time-scale-attribute" involved in the analysis. And, different data matrices are clustered to different results, which answer different topics. (3) Tri-clustering is able to interpret the spatio-temporal differentiation of geographic data in multiple directions, multiple scales, and multiple hierarchies, and thereby reveal the superposition effects of spatio-temporal scales of geographic features. Finally, we emphasize the significance of constructing 3D data matrices based on different geographic topics and expect that tri-clustering methods can enhance the ability to analyze geographic data with multiple spatial scales and attributes in the future.

Key words: tri-clustering, space-time-scale-attribute, integrated interpretation, spatio-temporal local similarity, spatio-temporal differentiation