-
摘要: 土壤环境变量具有较强空间异质性, 为空间插值精度的提升带来了困难, 仅基于空间相关性和空间异质性的空间插值方法难以获得较高的插值精度。机器学习方法能够融合多维辅助变量的信息, 提高土壤属性的插值精度, 但是不能有效融合空间位置关系信息进一步改善插值精度。本文基于随机森林空间预测框架, 将空间半变异函数与随机森林算法融合, 提出了融合半变异函数的空间随机森林插值方法。应用所提出的方法对湖南省湘潭县土壤重金属数据进行空间插值, 并与随机森林方法、基于距离的随机森林空间预测方法、普通克里金方法和回归克里金方法进行对比, 检验了所提出方法的插值精度。结果表明, 融合半变异函数的空间随机森林插值方法相较于传统克里金方法精度提升10%以上, 相较于新型机器学习空间插值方法精度提升5%以上, 同时基于半变异函数的空间随机森林插值方法的插值制图结果具有更加合理的空间分布和丰富的细节信息。因此, 融合半变异函数的空间随机森林插值方法能够有效结合辅助变量信息与空间位置关系信息, 有效提高土壤环境变量插值精度。Abstract: The strong spatial heterogeneity of soil environmental variables causes difficulties in improving spatial interpolation accuracy. It is difficult to obtain a high interpolation accuracy by leveraging spatial correlation and spatial heterogeneity. Machine learning methods can fuse the information of multi-dimensional auxiliary variables to improve the interpolation accuracy of soil attributes, but they cannot effectively utilize the spatial position relationship information to further improve the interpolation accuracy. Based on the random forest spatial prediction framework, this study combined the spatial semi-variogram with the random forest algorithm and proposed a spatial random forest interpolation method with a semi-variogram. Taking soil heavy metal data from the Xiangtan County of Hunan Province as an example, the proposed method was used to implement spatial interpolation of soil Cr. The interpolation accuracy was compared with the random forest method, distance-based random forest spatial prediction method, ordinary Kriging method, and regression Kriging method. The results showed that the accuracy was improved by more than 10% compared with the traditional Kriging method. Compared with the new machine learning spatial interpolation method, the accuracy was improved by more than 5%. Furthermore, the mapping of the proposed results had a more reasonable spatial distribution and detailed information. Thus, we concluded that the proposed method could effectively combine auxiliary variable information and spatial location information and improve the interpolation accuracy of soil environmental variables.
-
Key words:
- Spatial interpolation /
- Random Forest /
- Semi-variogram /
- Machine Learning /
- Regression Kriging
-
图 4 不同插值方法交叉验证结果的平均绝对误差(MAE)和均方根误差(RMSE)
RF: 随机森林法; RFsp: 随机森林空间预测框架; SRFsei: 融合半变异函数的空间随机森林插值法; UK: 泛克里金; OK: 普通克里金。
Figure 4. Mean absolute error (MAE) and root mean square error (RMSE) of cross-validation results of different interpolation methods
RF: random forest; RFsp: random forest for spatial predictions framework; SRFsei: spatial random forest with semi-variogram interpolation; UK: universal Kriging; OK: ordinary Kriging.
图 6 不同插值方法的插值结果图(a)及误差标准差图(b)
RF: 随机森林法; RFsp: 随机森林空间预测框架; OK: 普通克里金; UK: 泛克里金。
Figure 6. Interpolation results (a) and standard deviation variance of error (b) of different interpolation methods
RF: random forest; RFsp: random forest for spatial predictions framework; OK: ordinary Kriging; UK: universal Kriging.
-
[1] LIU P, HU W Y, TIAN K, et al. Accumulation and ecological risk of heavy metals in soils along the coastal areas of the Bohai Sea and the Yellow Sea: a comparative study of China and South Korea[J]. Environment International, 2020, 137: 105519 doi: 10.1016/j.envint.2020.105519 [2] RAI P K, LEE S S, ZHANG M, et al. Heavy metals in food crops: Health risks, fate, mechanisms, and management[J]. Environment International, 2019, 125: 365−385 doi: 10.1016/j.envint.2019.01.067 [3] WANG Q, XIE Z Y, LI F B. Using ensemble models to identify and apportion heavy metal pollution sources in agricultural soils on a local scale[J]. Environmental Pollution, 2015, 206: 227−235 doi: 10.1016/j.envpol.2015.06.040 [4] LIU J, LIU Y J, LIU Y, et al. Quantitative contributions of the major sources of heavy metals in soils to ecosystem and human health risks: a case study of Yulin, China[J]. Ecotoxicology and Environmental Safety, 2018, 164: 261−269 doi: 10.1016/j.ecoenv.2018.08.030 [5] HOU D Y, OʹCONNOR D, NATHANAIL P, et al. Integrated GIS and multivariate statistical analysis for regional scale assessment of heavy metal soil contamination: a critical review[J]. Environmental Pollution, 2017, 231: 1188−1200 doi: 10.1016/j.envpol.2017.07.021 [6] 陈昕. 基于GIS的柘塘镇土壤重金属污染的空间格局分析与预测[D]. 南京: 南京农业大学, 2016CHEN X. The spatial pattern analysis and prediction of soil heavy metal based on GIS in Zhetang Town[D]. Nanjing: Nanjing Agricultural University, 2016 [7] MAHMOUDABADI E, SARMADIAN F, NAZARY MOGHADDAM R. Spatial distribution of soil heavy metals in different land uses of an industrial area of Tehran (Iran)[J]. International Journal of Environmental Science and Technology, 2015, 12(10): 3283−3298 doi: 10.1007/s13762-015-0808-z [8] ZHANG L X, ZHU G Y, GE X, et al. Novel insights into heavy metal pollution of farmland based on reactive heavy metals (RHMs): Pollution characteristics, predictive models, and quantitative source apportionment[J]. Journal of Hazardous Materials, 2018, 360: 32−42 doi: 10.1016/j.jhazmat.2018.07.075 [9] GOODCHILD M F. GIScience, geography, form, and process[J]. Annals of the Association of American Geographers, 2004, 94(4): 709−714 [10] LI J, HEAP A D. A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors[J]. Ecological Informatics, 2011, 6(3/4): 228−241 [11] LI J, HEAP A D. Spatial interpolation methods applied in the environmental sciences: a review[J]. Environmental Modelling & Software, 2014, 53: 173−189 [12] WANG J F, HAINING R, LIU T J, et al. Sandwich estimation for multi-unit reporting on a stratified heterogeneous surface[J]. Environment and Planning A: Economy and Space, 2013, 45(10): 2515−2534 doi: 10.1068/a44710 [13] GAO B B, HU M G, WANG J F, et al. Spatial interpolation of marine environment data using P-MSN[J]. International Journal of Geographical Information Science, 2020, 34(3): 577−603 doi: 10.1080/13658816.2019.1683183 [14] ZHU A X, LU G N, LIU J, et al. Spatial prediction based on Third Law of Geography[J]. Annals of GIS, 2018, 24(4): 225−240 doi: 10.1080/19475683.2018.1534890 [15] LIN Y P, CHENG B Y, CHU H J, et al. Assessing how heavy metal pollution and human activity are related by using logistic regression and kriging methods[J]. Geoderma, 2011, 163(3/4): 275−282 [16] BREIMAN L. Statistical modeling: The two cultures[J]. Statistical Science, 2001, 16(3): 199−231 doi: 10.1214/ss/1009213725 [17] TAN Z, YANG Q, ZHENG Y. Machine learning models of groundwater arsenic spatial distribution in Bangladesh: influence of holocene sediment depositional history[J]. Environmental Science & Technology, 2020, 54(15): 9454−9463 [18] FOTHERINGHAM A S, BRUNSDON C, CHARLTON M. Geographically weighted regression: The analysis of spatially varying relationships[J]. Geographical Analysis, 2003, 35(3): 272−275 [19] GEORGANOS S, GRIPPA T, NIANG GADIAGA A, et al. Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling[J]. Geocarto International, 2021, 36(2): 121−136 doi: 10.1080/10106049.2019.1595177 [20] HENGL T, HEUVELINK G B M, ROSSITER D G. About regression-kriging: From equations to case studies[J]. Computers & Geosciences, 2007, 33(10): 1301−1315 [21] HENGL T, NUSSBAUM M, WRIGHT M N, et al. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables[J]. PeerJ, 2018, 6: e5518 doi: 10.7717/peerj.5518 [22] SEKULIĆ A, KILIBARDA M, HEUVELINK G B M, et al. Random forest spatial interpolation[J]. Remote Sensing, 2020, 12: 1687 doi: 10.3390/rs12101687 [23] HEUNG B, BULMER C E, SCHMIDT M G. Predictive soil parent material mapping at a regional-scale: a Random Forest approach[J]. Geoderma, 2014, 214/215: 141−154 doi: 10.1016/j.geoderma.2013.09.016 [24] 齐杏杏, 高秉博, 潘瑜春, 等. 基于地理探测器的土壤重金属污染影响因素分析[J]. 农业环境科学学报, 2019, 38(11): 2476−2486 doi: 10.11654/jaes.2019-0537QI X X, GAO B B, PAN Y C, et al. Influence factor analysis of heavy metal pollution in large-scale soil based on the geographical detector[J]. Journal of Agro-Environment Science, 2019, 38(11): 2476−2486 doi: 10.11654/jaes.2019-0537 [25] GAO B B, LU A X, PAN Y C, et al. Additional sampling layout optimization method for environmental quality grade classifications of farmland soil[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2017, 10(12): 5350−5358 doi: 10.1109/JSTARS.2017.2753467 -