基于马氏距离和最小生成树的土壤有机质含量高光谱估测

Hyperspectral Estimation of Soil Organic Matter Content Based on Mahalanobis Distance and Minimum Spanning Tree

  • 摘要:
      目的  以长沙及周边区域水田土壤为研究对象,从特征空间和样本空间角度优化建模数据集,以提升基于土壤高光谱数据估测有机质含量模型的精度。
      方法  利用马氏距离(MD)优化土壤光谱特征空间、最小生成树(MST)分割样本数据集优化建模样本空间,结合交叉验证岭回归(RidgeCV)和支持向量机回归(SVR)构建土壤高光谱有机质含量估测模型。
      结果  在测试集上,光谱优化建模方法MD-RidgeCV和MD-SVR的模型决定系数(R2)分别为0.876、和0.84,样本优化建模方法MST-RidgeCV和MST-SVR 的R2指标分别为0.847和0.815,而两种优化方法相结合的MD-MST-RidgeCV和MST-MD-RidgeCV模型 R2指标均高达0.9;对比基于原始数据集和建模集优化KS和SPXY方法的土壤有机质含量估测模型,提出的方法在测试集上具有更佳的模型预测性能。
      结论  利用马氏距离和最小生成树,从光谱特征空间和样本空间优化建模数据集,并结合回归算法RidgeCV和 SVR构建土壤有机质含量高光谱预测模型,能显著提高模型精度和稳定性。

     

    Abstract:
      Objective  Taking the paddy soil in Changsha and its surrounding areas as the research object, and using the soil hyperspectral data to estimate the content of soil organic matter (SOM), this paper aimed to improve the performance of soil organic matter content prediction model by optimizing the modeling dataset from the perspective of feature space and sample space.
      Method  Utilizing Mahalanobis distance (MD) to optimize soil hyperspectral feature space and splitting sample dataset by minimum spanning tree (MST) to optimize modeling sample space, SOM content prediction models were constructed combining with cross validation ridge regression (RidgeCV) and support vector machine regression (SVR).
      Result  On the test set, the R2 values of the feature space optimization-based models MD-RidgeCV and MD-SVR were 0.876 and 0.84, respectively, and those of the sample space optimization-based models were 0.847 and 0.815, respectively. The optimization combination model of MD-MST-RidgeCV and MST-MD-RidgeCV obtained R2 as high as 0.9. Compared with the estimation modeling of soil organic matter content based on the original dataset and other modeling set optimization method KS and SPXY, the proposed model had better prediction performance on the test set.
      Conclusion  Optimizing the dataset from the perspective of hyperspectral feature space and sample space, and coupling the regression algorithm RidgeCV and SVR to construct the prediction model of SOM content could significantly improve the accuracy and stability.

     

/

返回文章
返回