Abstract:
The research on the spatial prediction of soil organic matter by different models has an important guidance for the strategies and efficiency of scientifical formulating sampling and the accuracy improvement of soil spatial prediction. The total 6496 of the observed soil sites were divided into training and validation datasets stratified randomly according to the ratio of 8 to 2, and then ordinary kriging, random forest and random forest-regression kriging were employed to predict spatial variation of topsoil SOM in arable land in Xuchang, a prefecture-level region of Henan Province. The prediction accuracy was validated and the model performance was evaluated. The factors dominating topsoil SOM content and spatial variability in the study region were analyzed and identified through a Boruta feature selection approach. According to the prediction results produced from the three models, topsoil SOM contents of arable land in the region were moderately low, ranging from 18.70 to 18.81 g kg
−1, and a coefficient of variation from 0.15, to 0.17. Arable land with higher topsoil SOM content was concentrated in the mountainous areas of the northwestern and southwestern parts where the cinnamon soil formed on the loess parent materials was found mainly, and in the southeastern part where Shajiang black soil was distributed. The area of arable land with lower topsoil SOM content was mainly found in the central and northern of the study region. The validation results revealed that three models showed similar performance and prediction outputs, which could explain 33%-34% variance of topsoil SOM content of arable land. This result shows that the spatial prediction of SOM was of medium level in the case study at the same and similar scales. When the covariates are limited and the sample points are relatively evenly distributed, the ordinary kriging model is convenient to quickly obtain the spatial distribution of the target variables in the study area. If the covariates are abundant and easy to collect and use, the random forest model is recommended. The covariates are limited, but when the sample density is high, the random forest-regression kriging model may be a good choice for spatial prediction of the target variable.