Acta Geographica Sinica

Content of Models and Methods in our journal

Published in last 1 year
In last 2 years
In last 3 years
All

Please wait a minute...

Select all

|

Select

Models and Methods
Evaluation and enhancement methods of POI data quality in the context of geographic big data

XUE Bing, ZHAO Bingyu, LI Jingzhong

Acta Geographica Sinica. 2023, 78(5): 1290-1303. https://doi.org/10.11821/dlxb202305014

Download PDF (7905) HTML (5695) Knowledge map Save

CSCD(10)

Geographic big data enables a fine-grained depiction of regional human-terrestrial systems and provides new data for the study of human-terrestrial relations and regional development. At present, geographic big data research has entered the stage of widespread application, but the examination of its quality and the corresponding evaluation methods have been lacking to guarantee the widespread and efficient application of the data. POI is an important part of geographic big data and plays an important role in location-based services and an understanding of regional scenarios. This paper proposes a method to assess and enhance POI-type big data, and realize quality evaluation based on site research, GIS and other methods from three dimensions: feature identification completeness, data redundancy rate and spatial location accuracy; discover and summarize possible influencing factors of data quality based on data production process, and prove that multi-source data fusion is an effective means to enhance POI data quality. We found that: the volume of Amap data acquired based on API interface is slightly higher than that of Baidu, the accuracy rate of spatial location is comparable and the redundancy rate is lower; Amap focuses on identifying the entrance of features, which is suitable for analysis such as accessibility; Baidu focuses on discovering non-significant features, which is suitable for analysis such as spatial planning; the discovery, acquisition and processing stages are possible links to reduce data quality, which is influenced by data protection mechanism, and the data quality is inversely proportional to the acquisition volume and area. The quality assessment, enhancement and integration of multi-source heterogeneous geographic data is one of the key ways to enhance the "emergent value" of data, promote trans- and cross-multidisciplinary and solve geographic problems in the new era.
Select

Models and Methods
Experimental study of population density using an optimized random forest model

LI Lingling, LIU Jinsong, LI Zhi, WEN Peizhang, LI Yancheng, LIU Yi

Acta Geographica Sinica. 2023, 78(5): 1304-1320. https://doi.org/10.11821/dlxb202305015

Download PDF (1202) HTML (2698) Knowledge map Save

CSCD(5)

Random forest model is a mainstream research method to accurately describe the regional population distribution law and impact mechanism. Taking Shijiazhuang as the experimental area and its endowment zones as the modeling unit, we carried out stratified sampling on a hectare grid scale, and conducted a systematic experiment to determine the factors influencing the increasing population density. An optimized random forest model was applied throughout the whole process of zoning modeling, stratified sampling, factor selection, to obtain weighted outputs. Four main conclusions can be drawn as follows: (1) Zoning before modeling prevented the model from confusing the population distribution laws. Sampling at the raster unit not only freed the training samples from the modifiable areal unit problem (MAUP), but also formally reduced the negative effect of the ecological fallacy. Stratified sampling ensured the stability of the maximum population density in the training samples. (2) The experiments to determine the factors influencing population density were conducted in different zones, and the introduction of these factors significantly improved the fit (R²) of the model. Distance to a settlement was the dominant factor influencing population density in each zone. There were significant differences in the geographical mechanisms that influenced the population distribution in different regions. Innovation endowment factors had the strongest impact on population density in urban areas, while natural endowment factors had the strongest impact in rural areas. (3) The optimized combination of the population density prediction datasets significantly improved the robustness of the model. (4) The population density datasets had the characteristics of multi-scale superposition. At the large scale, the population density in the plain area was higher than that in the mountain area, whereas at the small scale the population density in urban areas was higher than that in rural areas, which represented the characteristics of a core-periphery model. The optimized scheme of the population density random forest model provided a unified technical framework for determining the factors that control the local population distribution and the geographical mechanisms that influence population distribution.