Abstract:
Objective Taking the paddy soil in Changsha and its surrounding areas as the research object, and using the soil hyperspectral data to estimate the content of soil organic matter (SOM), this paper aimed to improve the performance of soil organic matter content prediction model by optimizing the modeling dataset from the perspective of feature space and sample space.
Method Utilizing Mahalanobis distance (MD) to optimize soil hyperspectral feature space and splitting sample dataset by minimum spanning tree (MST) to optimize modeling sample space, SOM content prediction models were constructed combining with cross validation ridge regression (RidgeCV) and support vector machine regression (SVR).
Result On the test set, the R2 values of the feature space optimization-based models MD-RidgeCV and MD-SVR were 0.876 and 0.84, respectively, and those of the sample space optimization-based models were 0.847 and 0.815, respectively. The optimization combination model of MD-MST-RidgeCV and MST-MD-RidgeCV obtained R2 as high as 0.9. Compared with the estimation modeling of soil organic matter content based on the original dataset and other modeling set optimization method KS and SPXY, the proposed model had better prediction performance on the test set.
Conclusion Optimizing the dataset from the perspective of hyperspectral feature space and sample space, and coupling the regression algorithm RidgeCV and SVR to construct the prediction model of SOM content could significantly improve the accuracy and stability.