
Digital soil mapping is the creation of spatial soil information systems using field and laboratory methods coupled with spatial and non-spatial soil inference systems (Lagacherie, McBratney and Voltz, 2006). A digital soil map is a spatial database of soil properties that is based on a statistical sample of landscapes or regions and that permits functional interpretation, spatial prediction and mapping of soil properties relevant to soil management and policy decisions. A digital soil map provides (i) information on a soil’s capacity to provide ecosystem services (such as ability to infiltrate water, produce crops, store carbon); (ii) a geographical representation of soil constraints (such as aluminum toxicity, carbon deficit, sub-soil restrictions) with known confidence, (iii) spatial targeting of management recommendations, and (iv) a baseline for change detection and impact assessment.
Digital soil mapping uses statistical models to predict soil functional properties and degradation prevalence at unobserved locations in the landscape. The most basic model for soil-landscape prediction can be written as:
si = f(Q)i + ei
Where si is a soil property or condition of interest at a given geographical location (i), Q is a vector of covariates (such as reflectance data from satellite images, digital terrain models and/or climate surfaces), and e is an uncertainty parameter. This is essentially the classical state factor model of soil formation, which states that soil condition, or more broadly ecosystem condition (L, for larger system), is a function of state factors including climate (cl), organisms (o), relief (r), parent material (p), system age (or time, t) and any other, typically more local and historically contingent factors (Jenny, 1941; Amundson & Jenny, 1997). The model implies that once the spatial distribution of state factors is known, specific soil properties or conditions may be inferred geographically on the basis of f(Q) and the residual (spatial) distribution of e.
There are a variety of statistical approaches that have been used to parameterize this basic model. These differ in terms of their representational realism and computational complexity and include: classical geostatistics (e.g., regression kriging, co-simulation, etc.), as well as more recent approaches based on hierarchical models (Pinheiro & Bates, 2002), generalized estimating equations (Liang & Zeger, 1986), additive models (Hasti & Tibshirani, 1990), and Markov Chain, Monte Carlo simulation (MCMC, Clark & Gelfand, 2006) among others. In the context of this project we will develop pragmatic guidelines as to when and how these different techniques can be used appropriately. The guidelines will be supported by comparative, worked examples and code implemented in the freely available R environment for statistical computing (http://www.R-project.org).
We are producing digital soil maps using legacy data (e.g. from the existing ISRIC-WISE and SOTER databases as well as the new legacy data collection using approaches and standards that will be fully compliant with the GDSM initiative. We will also produce digital soil maps using the sentinel site soil data, Landsat and SRTM derivatives. Both types of maps are expected to be available for the entire project area as “version 1.0” products at the end of the 4-year period.
Characterization of additional sentinel sites would further reduce the statistical uncertainties in the underlying spatial models that will be developed under this activity. Based on the current sample, we will be able to describe these in quantitative terms, and thus provide spatially explicit recommendations as to where and over what aerial extent additional sampling, and surveillance activities should be undertaken.