Abstract:
Objective Soil properties often exhibit remarkable spatial heterogeneity under the influence of multiple environmental variables. Existing calibration set selection methods seldom take into account the spatial heterogeneity of soil properties, which can lead to inadequate global representativeness of the calibration set samples, thereby affecting the predictive accuracy and robustness of the model. This study aimed to further consider the influence of spatial heterogeneity patterns on sample representativeness based on the existing classical calibration set selection methods, to propose an improved calibration set selection strategy that takes spatial heterogeneity patterns into account.
Method Firstly, the regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP) method was employed to mine the spatial heterogeneity patterns of soil attributes, and the geographical continuous regional division was obtained, which result in the soil attribute distribution characteristics similar within each subzone ant substantial differences among subzones. Subsequently, classical calibration set selection methods, including concentration gradient (Rank) method, Kennard-Stone (KS) method, and SPXY method, were employed within each subzone to select samples that exhibit local representativeness. Finally, representative samples from each subzone are combined to construct a calibration set that exhibits both global geographical spatial information representativeness and soil attribute representativeness. These methods are referred to as the REDCAP-Rank method, the REDCAP-KS method, and the REDCAP-SPXY method. To validate the effectiveness of the proposed method, applications were conducted in the northern region of Germany, and comparative analyses were performed with traditional calibration set selection methods. Among them, the prediction model of soil organic carbon (SOC) by using Partial Least Square Regression (PLSR), Support Vector Machine (SVM) model and Random Forest (RF).
Result The results showed that the modeling accuracy of correction sets selected by REDCAP-Rank method, REDCAP-KS method and REDCAP-SPXY method was improved overall compared with the traditional correction set selection methods. Among them, compared with KS method, REDCAP-KS correction set selection method has improved the accuracy of all prediction model results, the highest increase of
R_\textp^\text2 was 0.11, and the RPD growth rate was up to 14.47%. Compared with SPXY method, REDCAP-SPXY correction set selection method could improve the accuracy of 93.3% prediction, the highest increase of
R_\textp^\text2 was 0.09, and the RPD growth rate was up to 13.04%. Among the six methods of KS, REDCAP-KS, SPXY, REDCAP-SPXY, Rank and REDCAP-Rank, the modeling effect of REDCAP-KS was the best,
R_\textp^\text2 reaching 0.71 and RPD was 1.80.
Conclusion The correction set selection strategy based on REDCAP method can select samples with representative geospatial information, and construct PLSR model combined with sample sets divided by REDCAP-KS method, which can better meet the demand of hyperspectral inversion of SOC prediction.