A probabilistic approach to quantifying soil physical properties via time-integrated energy and mass input

Soils form as the result of a complex suite of biogeochemical and physical processes; however, effective modeling of soil property change and variability is still limited and does not yield widely applicable results. We suggest that predicting a distribution of probable values based upon the soil-forming state factors is more effective and applicable than predicting discrete values. Here we present a probabilistic approach for quantifying soil property variability through integrating energy and mass inputs over time. We analyzed changes in the distributions of soil texture and solum thickness as a function of increasing time and pedogenic energy (effective energy and mass transfer, EEMT) using soil chronosequence data compiled from the literature. Bivariate normal probability distributions of soil properties were parameterized using the chronosequence data; from the bivariate distributions, conditional univariate distributions based on the age and flux of matter and energy into the soil were calculated and probable ranges of each soil property determined. We tested the ability of this approach to predict the soil properties of the original soil chronosequence database and soil properties in complex terrain at several Critical Zone Observatories in the US. The presented probabilistic framework has the potential to greatly inform our understanding of soil evolution over geologic timescales. Considering soils probabilistically captures soil variability across multiple scales and explicitly quantifies uncertainty in soil property change with time.


Introduction
Pedogenic models that can be widely applied and easily utilized are paramount for understanding soil-landscape evolution, soil property change with time, and predicting future soil conditions.A mathematically simple, easily parameterized approach has yet to be developed that is capable of predicting current soil properties or recreating potential soil evolution with time.Here we address this knowledge gap through the development of a probabilistic model of soil property change capable of predicting soil properties across a wide range of terrains, climates, and ecosystems.
The state-factor approach has been one of the primary pedogenic models since its development in the late 1800s and early 1900s (Dokuchaev, 1883;Jenny, 1941).The soil statefactor approach (Jenny, 1941) assumes that the state of the soil system or specific soil properties (S) may be described as a function of the external environment, represented by climate (cl), biology (o), relief (r), parent material (p), and time (t): S = f (cl, o, r, p, t).This approach increased our understanding of soil variation across each factor, but more complex, multivariate approaches are generally not possible or difficult to derive from this formulation (Yaalon, 1975).From the original state-factor model have evolved pedogenic models that include functional (Jenny, 1961), energetic (Rasmussen and Tabor, 2007;Rasmussen et al., 2005Rasmussen et al., , 2011;;Runge, 1973;Smeck et al., 1983;Volobuyev, 1964), and mechanistic approaches (Finke, 2012;Minasny and McBratney, 1999;Salvador-Blanes et al., 2007;Vanwalleghem et al., 2013).However, many of these approaches are either limited to a site-specific basis, require a high degree of parameterization, or lack wide-scale applicability.
Published by Copernicus Publications on behalf of the European Geosciences Union.

C. Shepard et al.: A probabilistic approach to quantifying soil physical properties
Here we develop a simple probabilistic approach to predict soil physical properties using a large dataset of chronosequence studies.The model compresses state-factor variability into two key components (parent material and total pedogenic energy, defined in Sect.1.1) that were parameterized and calibrated using the chronosequence database.We hypothesized that a probabilistic approach predicts accurate ranges of soil physical properties based on the soil-forming environment.Additionally, we modified the model to include soil depth to capture the influence of redistributive hillslope processes to predict soil properties.We hypothesized that by including soil depth, the model would effectively predict the clay content in an independent dataset synthesizing soil and landscape variability in complex, hilly terrain from a wide range of environments.

Probabilistic model of soil property change
The model presented here is based on a reformulated statefactor model, where a location has a probability of displaying a range of differing soil morphologies and properties based upon the state factors, with some range of values more probable than others, meaning that the state-factor model (Jenny, 1941) may be restated as where the left-hand side of the equation, P (s 1 ≤ S ≤ s 2 ), represents the probability that a given soil will have a value located between a lower limit (s 1 ) and an upper limit (s 2 ) (Phillips, 1993b).Equation (1) can be restated more simply as where the original soil-forming state factors have been simplified to represent the fluxes of matter and energy into the soil system (P x ), incorporating the influence of climate and biology, and the initial state of the soil-forming conditions (L o ), incorporating the influence of the initial topography and original soil parent material and time or age of the soil system (t) (Jenny, 1961).Equation ( 2) was further simplified to make the approach operational.A quantitative measure of climate and biology was needed to represent the influence of P x on soil formation.We used a quantification of P x calculated from effective precipitation and biological productivity, termed effective energy and mass transfer (EEMT, J m −2 yr −1 ) (Rasmussen and Tabor, 2007;Rasmussen et al., 2005Rasmussen et al., , 2011)).EEMT provides a measure of the energy transferred to the subsurface, in the form of reduced carbon from primary productivity and heat transfer from effective precipitation, which has the potential to perform pedogenic work, e.g., chemical weathering and carbon cycling.Using EEMT as a simplification of P x , Eq. ( 2) was restated as (Rasmussen et al., 2011) (3) We further simplified Eq. ( 3) by combining the flux term EEMT and the age of the soil system (t).EEMT multiplied by the age of the soil system, i.e., EEMT × t, provides an estimate of the total energy transferred to the soil system over the course of its evolution, referred to here as total pedogenic energy (TPE, J m −2 ).The TPE provides an estimate of P x that incorporates soil age; thus, Eq. ( 3) may be restated as where at a certain point in time the probability of a soil property existing between s 1 and s 2 is a function of L o and TPE.
L o controls the spread or variation of the probability distribution P (s 1 ≤ S ≤ s 2 ) over time and the potential observable soil states, whereas TPE is proportional to the internal soil state at a given time (Jenny, 1961).Explicitly including time in Eq. ( 4) through TPE partially captures variation in soil property change attributable to topography and parent material.Soil residence time may be directly related to landscape position through topographic control on soil production and sediment transport and deposition (Heimsath et al., 1997(Heimsath et al., , 2002;;Yoo et al., 2007).Additionally, parent material modulates soil residence time through control on soil depth (Heckman and Rasmussen, 2011;Rasmussen et al., 2005), soil production, and sediment transport rates (Andre and Anderson, 1961;Portenga and Bierman, 2011).The initial conditions of the soil-forming system (L o ) are never fully known; however, representing the state of the soil system as a probable distribution of values, implicitly accounting for soil age, and not constraining the initial soil-forming conditions, the influence of initial conditions can be partially ignored, and hence we herein focus on modeling soil properties using only TPE.Quantitatively realizing Eq. ( 4) required the use of predetermined joint probability density functions parameterized with TPE and a selected soil physical property.Bivariate normal density functions were calculated to determine the probability of a soil property range given a TPE value.The bivariate density function was selected due to its simplicity and ease of parameterization; other bivariate density functions are available that may better fit the selected soil property data but are not considered here.The bivariate normal density distribution (Ugarte et al., 2008) was calculated as where ρ represents the Pearson correlation coefficient, µ x is the mean of TPE, µ y is the mean of the selected soil physical property, σ x is the standard deviation of TPE, and σ y is the standard deviation of the selected soil physical property.Using the bivariate normal density functions, conditional mean and variance values were calculated given a value of TPE; the conditional means and variances parameterized conditional univariate normal distributions for the selected soil physical properties.The conditional mean (Ugarte et al., 2008) was calculated as where µ Y |X=x is the conditional mean soil property value given a value for TPE.The conditional variance (Ugarte et al., 2008) was calculated as where σ 2 Y |X=x is the conditional variance of the soil property given a value of TPE.
Applying this approach required certain assumptions and simplifications.The model assumes that climate was constant over the entire duration of pedogenesis.The model makes no assumptions about the progressive and regressive processes that drive pedogenesis; by weighting all profiles equally, the net effects of both progressive (e.g., horizonation, clay accumulation, reddening) and regressive (e.g., haplodization, erosion, pedoturbation) pedogenic processes (Johnson and Watson-Stegner, 1987;Phillips, 1993a) are captured in the model structure.The model also does not consider the net effect of progressive and regressive pedogenic processes on the distribution of selected soil properties with depth.The model makes no assumptions about the initial soil-forming system, and we did not constrain the model to any particular initial condition for either parent material or geomorphic landform; the model simply describes the probability of a location exhibiting a range of soil properties based on TPE.The model assumes that all changes in soil physical properties are due to pedogenic processes.We used a bivariate normal distribution; consequently the model assumes that the data conform to a normal distribution.

Data collection and preparation
The probability distributions were parameterized using an extensive literature review of chronosequence studies.More than 140 chronosequence publications were identified using Google Scholar (www.scholar.google.com)and Thomson Reuters Web of Science (www.webofknowledge.com),44 of which contained the required data.Inclusion within the present study required the following: profile descriptions with horizon-level clay, sand, and silt content and soil depth; well-defined ages of the soil-geomorphic surfaces; and geographic coordinates or maps showing locations of the described profiles.The chronosequences spanned a wide range of geographic locations, ecosystems, climates, rock types, and geomorphic landforms (Fig. 1, Table S1 in the Supplement).The chronosequence soils spanned ages from 10 years to 4.35 Myr and depth ranges from 3.0 to 1460 cm, with mean annual temperature (MAT) and precipitation (MAP) ranging from −11.2 to 28.0 • C and 3.0 to 400 cm yr −1 , respectively.We were limited in site selection by the available data; as such we could not control for any bias that may exist with regard to site selection and reported soil property values.

Total pedogenic energy
The influence of both climate and vegetation at the locations of each soil profile was determined using effective energy and mass transfer (EEMT) (Rasmussen and Tabor, 2007;Rasmussen et al., 2005).EEMT quantifies the heat and chemical energy from effective precipitation and net primary productivity added to the soil system (Rasmussen and Tabor, 2007;Rasmussen et al., 2005Rasmussen et al., , 2011)).EEMT describes the energy added to the soil system that can perform pedogenic work, such as chemical weathering and carbon cycling.EEMT is adaptable to include specific energetic inputs to the soil system based upon the prevailing soil-forming environment, e.g., the energetics from added fertilizer in an agriculture field or the impact of human-induced erosion (Rasmussen et al., 2011).The EEMT values for each soil profile were extracted from a global map of EEMT derived from the monthly global climate dataset of New et al. (1999) at 0.5 • × 0.5 • resolution using ArcMap 10.1 (ESRI, Redlands, CA) (Rasmussen et al., 2011).For the chronosequence soils, EEMT values ranged from 2235 to > 200 000 kJ m −2 yr −1 .Total pedogenic energy (TPE, J m −2 ) was derived simply by multiplying EEMT (J m −2 yr −1 ) for each soil profile by its reported age (yr).TPE was used because it was a better predictor of soil physical properties relative to mean annual temperature, mean annual precipitation, or net primary productivity (NPP) (Table 3).
C. Shepard et al.: A probabilistic approach to quantifying soil physical properties

Application to chronosequence data
The chronosequence database included 44 distinct chronosequences representing 405 different soil profiles.We focused here on changes in sand, silt, and clay content and solum thickness as examples of soil property change with time.We tested the approach on depth-weighted (DWT) sand, silt, and clay content (reported as weight %), as well as the maximum measured value of sand, silt, and clay content within each soil profile.Buried horizons were removed from the soil profiles before either the maximum or DWT content values were calculated.Solum thickness was extracted for each profile, defined as the thickness of the horizons influenced by pedogenic processes or the depth to C horizons (Schaetzl and Anderson, 2005).The site RW-14 from McFadden and Weldon (1987) was not included in the solum thickness model calculations; the measured solum thickness of RW-14 was 1460 cm, 1 order of magnitude greater than all other soil profiles included in the study.Four hundred and five profiles reported clay content data, only 387 profiles reported sand and silt content, and 399 soil profiles contained a developed solum.We classified the soil profiles by parent material in terms of igneous, metamorphic, or sedimentary material and by geomorphic landform, e.g., alluvial surface, marine terrace, or moraine (Shoeneberger et al., 2012); for example, if a soil was formed on an alluvial fan from granitic parent material, it would be defined as alluvial and igneous.
Using the soil data, we calculated bivariate normal probability distributions using TPE and the soil physical properties (Eq.5).The soil data were transformed using logarithmic and square root transformations when appropriate to meet the normality assumption of the bivariate normal probability distribution.Conditional univariate normal distributions (Eqs.6, 7) were calculated to approximate probable ranges of soil properties using leave-one-out cross validation (LOOCV).Each of the soil chronosequences was removed from the model dataset, with the all remaining chronosequence data used to calculate the parameters of the bivariate and conditional univariate normal distributions.The conditional univariate normal distributions were calculated using the TPE values for the profiles within the left-out chronosequence.

Application to complex terrain
By design, soil chronosequences are generally sited on gentle, low, sloping terrain to minimize the influence of topography and erosion/deposition on soil formation (Harden, 1982).However, much of the Earth's surface is characterized by complex topography with high relief, steep slopes, and differences in slope aspect.Any predictive soil model or approach must be effective in both simple and complex terrain.To test the ability of the model to predict soil properties in complex terrain, we compiled data from upland catchments with variable parent material and topography from the lit-erature, as well as data available from the US NSF Critical Zone Observatory Network (CZO, www.criticalzone.org)(Table 1) (Bacon et al., 2012;Dethier et al., 2012;Foster et al., 2015;Holleran et al., 2015;Lybrand and Rasmussen, 2015;Rasmussen, 2008;West et al., 2013).Data from several additional studies from complex terrain were also included to test the model (Table 1) (Dixon et al., 2009;Yoo et al., 2007).These data were accessed from www.criticalzone.org or Google Scholar (www.scholar.google.com).These studies were included because they all contained horizon-level soil texture data, soil depth, percent volume rock fragment data, and 10 Be or U series measures of soil erosion rates or residence time, where mean residence time (MRT) was calculated as MRT = h/E, where h is soil depth (m) and E is erosion rate (m yr −1 ) (Pelletier and Rasmussen, 2009b).We used published coordinates to extract EEMT values, calculated from New et al. (1999), for each soil profile using Ar-cGIS 10.1 and used EEMT and MRT to calculate TPE.It should be noted the coarse resolution of New et al. (1999) EEMT values does not account for local-scale variation in water redistribution and primary productivity that can lead to significant topographic variation in EEMT (Rasmussen et al., 2015).Using Eq. ( 5) and the parameters generated from the chronosequence database, conditional mean depth-weighted clay content was calculated for each profile.
Due to the influence of redistributive hillslope processes on soil development (Yoo et al., 2007), soil depth varies systematically across hillslopes (Heimsath et al., 1997); thus, soil depth can be used to incorporate information about these processes within the model calculations.We calculated the mass per area clay content of these profiles using soil depth to incorporate this variation, as mass per area clay kg m −2 (8) where ρ b is the soil bulk density assumed to be 1500 kg m −3 for all soil profiles, µ Y |X=x, DWT CLAY is the predicted conditional mean for depth-weighted clay content (DWT CLAY) using Eq. ( 6), RF% is the measured depth-weighted percent volume rock fragments within the soil (when no RF% data were available we assumed a value of 41.7 %, which was the average RF% for profiles with reported values), and h is the soil depth in meters.Using Eq. ( 8), mass per area clay was calculated for each soil profile.Further, we examined the impact of depth, rock fragment percentage, and predicted conditional mean DWT clay on the predicted mass per area clay predictions using multiple linear regression.

Coupling geomorphic model with probabilistic model
Additionally, we applied the probabilistic model independent of measured soil data, across a small complex catchment in the Santa Catalina Mountains (Catalina-Jemez CZO, Fig. 2ab, Table 1) (Holleran et al., 2015;Lybrand and Rasmussen, 2015).The ∼ 6 ha catchment is located at an elevation between 2300 and 2500 m with mixed conifer vegetation, approximately 30 km northeast of Tucson, AZ (Fig. 2, Table 1).The approach utilized soil depth and residence time output from a process-based numerical soil depth model (Pelletier and Rasmussen, 2009a).The model used high-resolution lidar-derived topographic data to estimate 2 m pixel resolution soil depth and erosion rates (Fig. 2c) (Pelletier and Rasmussen, 2009a).These data were coupled with topographically resolved EEMT values that accounted for local hillslope-scale variation in water redistribution and primary productivity at a 10 m pixel resolution (Rasmussen et al., 2015) (Fig. 2d).We used calculated TPE from the topographically resolved EEMT and soil residence time values to predict DWT clay and coupled predicted DWT clay values with modeled depth from Pelletier and Rasmussen (2009a) in Eq. ( 8) to predict mass per area clay at a 2 m pixel resolution; the data processing and model apparatus are shown in Fig. 3.We assumed a constant 50 % rock fragment value for each location.The coupled geomorphic-TPE model outputs were compared with point measures of mass per area clay from Holleran et al. (2015) and Lybrand and Rasmussen (2015).Model data were completely independent of the Holleran et al. (2015) and Lybrand and Rasmussen (2015) datasets such that they served as validation data for the modeled output.

Model domain
The model was parameterized using chronosequence studies; as such, the model is best suited for generally low, sloping terrain.The model was extended to complex terrain using the described correction above (Sect.2.4), widening the model domain to steeply sloping terrain.The model does not consider human activities or aeolian additions and should not be extended to soils significantly impacted by either humans or dust.The model was trained on a diverse array of parent materials and ecosystems and could be utilized in climates with MAT ranging from −10 to 28 • C and MAP ranging from 3 to 400 cm yr −1 .The model could be utilized on soils spanning multiple magnitudes in age, from 10 years to greater than 4 Myr.

Application and parameterization to chronosequences
The relationships between TPE and soil texture and solum thickness were used to calculate the bivariate probability distributions.The bivariate probability distributions (Eq.5) were parameterized using the means, standard deviations, and Pearson's correlation from the chronosequence database (Table 2).Furthermore, the relationship between TPE and the soil properties was stronger than just using age, NPP, MAP, or MAT alone (Table 3).Age was expected to strongly correlate to the soil properties due to the design of chronosequence studies; however, comparing age and TPE separately, the percent increase in Spearman rank correlations (r) ranged from 8.7 % (DWT silt) to 25.6 % (max sand).Maximum and depth-weighted silt content were weakly correlated to both age and TPE and exhibited only a minimal change in Spearman's rank correlation with TPE relative to age.The correlation between TPE and maximum clay content (Fig. 4, Pearson's ρ = 0.78, r 2 = 0.62, √ Max Clay = −7.38 + 1.37 • log(TPE), df = 403) was highly significant and presented the strongest probabilistic relationship determined between TPE and the soil properties.The bivariate probability surface displayed the greatest probability around the joint means between TPE and maximum clay content (Fig. 4).Solum thickness and TPE were also strongly re- lated, but weaker relative to the maximum clay-TPE relationship (Fig. S1 in the Supplement, Pearson's ρ = 0.65, r 2 = 0.42×log(solum thickness) = −0.58+0.27•log(TPE),df = 397).The relationships between TPE and max sand (Fig. S2) and silt (Fig. S3) contents were generally weaker, relative to clay and solum thickness, with little to no relationship between TPE and silt content.

Application in complex terrain
The model was much less effective in complex terrain and highly overpredicted DWT clay contents in soils located in complex landscapes (Fig. 7a, r 2 = 0.26, y = 0.39x + 7.36, p<0.0001,RMSE = 5.4 %).The model highly overpredicted the clay content of the South Carolina site and the Gordon Gulch soils and underpredicted the clay content of the Rincon, Santa Catalina, and Jemez sites.LOOCV results for max clay content.The results were subdivided by general soil parent material: igneous, metamorphic, and sedimentary; the points represent the geomorphic surface each soil formed on, and the colors represent the EEMT value for the location of each soil.Using LOOCV, where one chronosequence was removed from the model dataset and the remaining datasets were used to predict the parameters of the bivariate distributions, the conditional means of the left-out chronosequence was determined.The model was effectively able to predict the conditional mean values of the max clay contents with an r 2 = 0.54 (RMSE = 14.8 %).The model was least capable of predicting the clay contents on coral reef terraces (+) and appeared the most effective for alluvial surfaces ( ).When correcting for the influence of hillslope processes by explicitly including soil depth and calculating mass per area clay, the approach effectively predicted clay content, with an r 2 = 0.81 (Fig. 7b, y = 1.58x − 15.5, p<0.0001,RMSE = 86.4kg clay m −2 ), only slightly overpredicting clay content, with a regression slope of 1.58.Soil depth was the strongest contributing factor to the mass per area clay prediction with the greatest sums of squares in a simple multiple linear regression including depth, RF%, and DWT clay% (Table 4); predicted conditional mean clay content percentage was the second strongest contributing factor to the mass per area clay prediction.Rock fragment percentage did not influence the mass per area clay content prediction.

Coupled geomorphic-TPE model
The coupled geomorphic-TPE model effectively predicted mass per area clay for the majority of soils located within the Marshall Gulch sub-catchment with an r 2 = 0.74 (Fig. 8a, y = 0.86x − 5.06, p<0.0001,RMSE = 17.7 kg clay m −2 ).For a subset of soils, the model did not effectively predict mass per area clay, and this was excluded from the regression in Fig. 8a; four of these soils were located on the east-facing ridge of the catchment, and an additional two soils were formed on amphibolite rather than the granite or quartzite materials that all of the other soils in the catchment were derived from.All of these locations also exhibited a poor fit between modeled and measured soil depth (Fig. 2e).The spatial distribution of mass per area clay was also predicted across the catchment (Fig. 8b), independently of measured data, and generally conformed to previously predicted spatial distribution of clay stocks in the Marshall Gulch catchment (Holleran et al., 2015).9).The model was incapable of directly predicting DWT clay for the soils in complex terrain due to redistributive hillslope processes; r 2 = 0.26 between measured and predicted conditional mean DWT clay (a).By including information about soil depth and percent volume rock fragment and converting DWT clay to mass per area clay, the model was significantly more effective at predicting clay contents for these soils r 2 = 0.81.

Model results for chronosequences
The model predicted maximum clay content across a diverse range of lithologies, climates, and landforms.Weathering and clay production are primary pedogenic processes (Birkeland, 1999;Schaetzl and Anderson, 2005), and because the model assumed that all changes in the soil profile are due to these processes and TPE is closely related to degree of weathering, the model was the most effective at predicting clay content.For initial soil states that begin pedogenesis with a potentially significant amount of claysized particles the model was much less effective.The soils of the Taiwanese chronosequence formed from conglomerates (Huang et al., 2010); conglomerates are typically poorly sorted, such that these soils initially formed with high clay contents slowing clay accumulation, limiting the effectiveness of the model to predict clay contents in these soils.Additionally, the model highly underestimated the clay content of soils located on coral reef terraces in tropical environwww.soil-journal.net/3/67/2017/SOIL, 3, 67-82, 2017 ments (Maejima et al., 2005;Muhs, 2001).Coral reef terraces represent a relatively unique landform that weathers rapidly to fine-sized particles, especially under tropical climates, and generally have complicated parent material compositions (Muhs et al., 1987).The combination of these factors limited the ability of the model to predict the soil properties on these surfaces.Sand and silt displayed weaker relationships with increasing total pedogenic energy.The lack of correlation of sand and silt to TPE may result in part from the definitions of the particle size classes.Sand-sized particles span a difference in particle size of several orders of magnitude, ranging from particles of 2 to 0.05 mm (Soil Survey Staff, 2010), whereas clays are constrained to a particle size less than 0.002 mm.The sequential weathering of rock fragments and coarse sand to fine and very fine sands therefore is not reflected in total sand content and likely diminishes the relationship between sand content and total pedogenic energy and time (Pye and Sperling, 1983;Pye, 1983;Sharmeen and Willgoose, 2006).The relationship between silt content and pedogenic energy was the weakest of the three broad particles size classes (Tables 2, 3).Similar to sand, the silt size fractions span 1 order of magnitude in particle size ranging from 0.05 to 0.002 mm in diameter.Further, the sand and silt fractions are dominated by resistant primary minerals (Pye, 1983) and would not change greatly in response to increased TPE or weathering, which may partly account for the weaker correlations with TPE.Additionally, the silt fraction may also be heavily influenced by the deposition of eolian material and thereby introduce an additional mass of silt that was not derived from the direct weathering of the initial soil-forming system (Mc-Fadden et al., 1987) effectively uncoupling silt content from total pedogenic energy.
Solum thickness displayed a relatively strong relationship with increasing pedogenic energy, with TPE explaining up to 42 % of the variance in solum thickness (Tables 2, 3).Soil production is related to climatic variation (Amundson et al., 2015), with this variation partly captured by EEMT and TPE, leading to the slightly stronger predictive power of the model.However, soil production is also highly influenced by redistributive hillslope process, chemical and physical weathering, and tectonic uplift (Heimsath et al., 1997;Riebe et al., 2004;Yoo and Mudd, 2008b) and can be a highly nonlinear process (Pelletier and Rasmussen, 2009a).These factors were not directly accounted for in this study in that topography was not a quantified factor, which likely represents a large proportion of the remaining unexplained variance in solum thickness.

Model results in complex terrain
Due to using soil chronosequence data to parameterize the approach, the influence of redistributive hillslope processes was not captured.Additionally, in the amount of time required to transport soil across a hillslope, chemical and phys-ical alterations of the soil particles are possible and may not be reflected in mean residence time calculations (Yoo and Mudd, 2008a;Yoo et al., 2007).Soil thickness is highly dependent upon hillslope position and landscape morphology (Dietrich et al., 2003;Heimsath et al., 1997;Pelletier and Rasmussen, 2009a).By using soil thickness as a proxy for the strength of these redistributive hillslope processes and converting the predicted conditional mean clay content value to a mass per area basis, the model was able to capture differences in clay content across complex terrain for a variety of lithologies and climates.The differing lithologies, climates, or vegetation types did not appear to impact the ability of the model to predict clay contents, likely because local variation in soil depth accounts for many of these controls.Parent material and climate influence the weathering process and production of clay in soils (Harden and Taylor, 1983;Muhs et al., 2001); however, these factors are collinear with soil depth (Heckman and Rasmussen, 2011;Lybrand and Rasmussen, 2015;Pelletier and Rasmussen, 2009a), such that by including soil depth, differences due to lithology or climate were partly incorporated in the model prediction.

Results from coupled geomorphic-TPE model
For the majority of sites in the Marshall Gulch subcatchment, the coupled geomorphic-TPE model was highly effective at predicting clay content and the spatial distribution of clay stocks.Large differences were found for four soils located on the east-facing ridge of the catchment underlain by granite, with the model generally overpredicting soil depth and clay content.Discrepancies between the modeled and measured depths were likely the primary sources of error within the mass per area clay predictions for the four eastfacing ridge soils (Fig. 2e).The geomorphic model predicted deeper soil depths due to the presence of an apparent convergent zone on the east-facing ridge of the sub-catchment; however, this convergent zone is only a small feeder tributary to the larger catchment drainage.The inability of the model to effectively predict clay contents and the mismatch between modeled and actual soil depths in the four soils located on the east-facing ridge is likely due to this local, finescale topographic variation.The fine-scale topographic variation may indicate that the scale of soil property predictions is important in achieving accurate predictions.Fine spatial scales match the scale of local soil-landscape variation and processes, but fine-scale variation in weathering rates and lithology is also required to better predict soil depth within the catchment (McKenzie and Ryan, 1999).
Error in predicted soil depths due to fine-scale differences in lithology within the Marshall Gulch sub-catchment partly explains the discrepancies between measured and predicted mass per area clay contents.For two amphibolite-derived soils, the model greatly underestimated mass per area clay.The geomorphic soil depth model assumed a uniform weathering rate based on the granitic soils (Pelletier and Ras-  mussen, 2009a); due to differences in primary mineral assemblage, the amphibolite materials are likely weathering at a faster rate compared to the granite-derived soils (White et al., 2001;Wilson, 2004), resulting in greater clay production and likely explaining the underestimated clay contents.Inclusion of differential weathering rates for varying lithologies within the geomorphic model would likely lead to better prediction of clay contents, but in areas of complex lithology this would require detailed information about distributions of differing lithologies.With these adjustments, the coupled geomorphic-TPE model represents an effective, independent prediction of clay stocks.

Advantages of probabilistic approach
Simplifying and representing the soil-forming factors as multivariate distributions and probabilities has the potential to quantitatively represent the general state-factor model, making the approach universally applicable.The initial state of the soil can likely never be fully known, leading to variability in soil properties over time that cannot necessarily, or ever, be attributed to any external factor (Phillips, 1989(Phillips, , 1993b)).A probabilistic approach utilizes that variability to drive predictions and understanding of these systems.Similar to the approach taken here, building distributions of the soil-forming state factors that are associated with distributions of particular soil properties could yield probabilistic predictions of soil formation and change.We selected to use a representation of climate and biology (EEMT).However, depend-ing on the soil property of interest, the variables needed to parameterize the distributions would likely change; for example, if interested in organic matter content, aboveground net primary productivity or the normalized difference vegetation index may be better predictors of organic matter accumulation.The strength of this approach lies in the fact that no assumptions are made about the initial conditions of the soil-forming system or the specific soil-forming processes.
Predicting probable distributions of soil physical properties implicitly acknowledges that our understanding of any system is incomplete but explicitly quantifies uncertainty in predictions and constrains the potential observable values to a predicted range.Utilizing this approach will require the necessary data to build distributions that are widely representative and applicable to most locations (Yaalon, 1975).With wide accessibility to large databases of soil information, such as the US National Soil Information System (NA-SIS) and the FAO Harmonized World Soil Database, access to the required amount and quality of data may be possible.Similar to the present study, simple bivariate distributions could be solved to calculate conditional distributions based on the soil-forming state factors, effectively producing quantitative probabilistic representations of Jenny's original equation (Jenny, 1941).The simplicity of the present approach allows easy integration into preexisting geomorphic models of landscape evolution.Past approaches that have combined pedogenic and landscape evolution models have generally focused www.soil-journal.net/3/67/2017/SOIL, 3, 67-82, 2017 on producing hypothetical soil-landscape relationships that progress forward through time (Minasny and McBratney, 2001;Vanwalleghem et al., 2013) or have focused on idealized landscapes (Temme and Vanwalleghem, 2015).However, by combining probabilistic approaches parameterized using known landscapes and geomorphically based landscape evolution models, predictions of the current state of the soil landscape can be investigated.As was demonstrated in Fig. 7b, combining the present approach with geomorphically based soil depth models generated from DEMs has great potential to predict soil properties across a diverse range of environments, without needing prior knowledge of the landscape other than topography and climate.Further, potential soil landscapes can be investigated by updating EEMT values to incorporate future climate scenarios available from predictive climate models (Gent et al., 2011;Taylor et al., 2012) and topographic and hydrological impacts due to changes in topography over time (Rasmussen et al., 2015).

Limitations and potential refinements
There are obvious limitations within the current model: a lack of consideration of parent material influences, topographic variation, human impacts, internal soil feedbacks and thresholds, determination of landscape and soil age, and differences in paleoclimate variation.Parent material control on the relative proportion of weatherable minerals and mineral weathering rates (Jackson et al., 1948) can manifest itself as vastly different soil morphologies and rates of pedogenesis when controlling for other soil-forming factors or even without controlling for other factors (Heckman and Rasmussen, 2011;Parsons and Herriman, 1975;Phillips, 1993b).The current approach implicitly assumes no information about the initial conditions, only that all clay production is a pedogenic process.The application of this approach to parent materials, where a large fraction of clay-sized particles formed through non-pedogenic processes, is thus limited and may explain why the model was ineffective for some soils.
Refining the current approach would require normalization of soil to the particle size distribution of the soil parent material.Past studies have utilized highly characterized parent material data to model soil property change with time (Chadwick et al., 1990;Harden, 1982), but these data are generally difficult to obtain and often not reported in the available chronosequence literature.Topography dictates soil chemical and physical properties and residence times, especially in complex terrain (Almond et al., 2007;Egli et al., 2008;Lybrand and Rasmussen, 2015), where nonlinear diffusive hillslope processes control the fluxes of matter and energy into and out of the soil system (Heimsath et al., 1997;Pelletier and Rasmussen, 2009a;Rasmussen et al., 2015;Yoo and Mudd, 2008b;Yoo et al., 2007).Using earlier versions of EEMT (Rasmussen and Tabor, 2007;Rasmussen et al., 2005), the current formulation of the model and TPE does not explicitly quantify topographic variation, which may account for error within current soil property distributions and predictions.With the inclusion of topographic variation in EEMT (Rasmussen et al., 2015) and topographic control of soil residence times (Foster et al., 2015;West et al., 2013), we were able to correct this error with the present approach and effectively predicted clay stocks in complex terrain.
Human activities significantly alter soil physical properties (Grieve, 2001;Neff et al., 2005;Pouyat et al., 2007).For example, differences in land use and increased grazing activity can alter soil physical properties such as clay and sand content across landscapes (Neff et al., 2005;Pouyat et al., 2007) or compaction from farming equipment leading to increased bulk density and increased erosion rates (Fullen, 1985;Hamza and Anderson, 2005).Human impacts on soil physical properties were not included in the presented model.The energetic contributions due to human impacts can be incorporated within the EEMT apparatus, and adjusted model parameters can be calculated (Rasmussen et al., 2011).Human impacts on soil physical properties may be locally important, but for the majority of locations, human energetic contributions to the soil system are generally orders of magnitude smaller compared to the energetic inputs from solar radiation, precipitation, or primary productivity.
Internal or intrinsic feedbacks and thresholds within the soil system drive pedogenic development without changes in the external state factors (Chadwick and Chorover, 2001;Muhs, 1984).For example, greater chemical weathering and clay production due to increased water residence time caused by argillic horizon development is the result of an internal feedback that is independent of the external climatic and biological system (Schaetzl and Anderson, 2005).These thresholds can operate as progressive or regressive processes, driving soil formation forward or hindering further development (Johnson and Watson-Stegner, 1987;Phillips, 1993a).Internal soil development feedbacks were not explicitly considered in the present model formulation.The presence of these internal feedbacks may partially explain error within the model predictions.Changes in EEMT would not explain all observed differences in soil properties over the age of the soil.However, if these feedbacks were operating in the included soils, the influence of intrinsic thresholds was implicitly captured within the probability distributions, partially accounting for the role of internal soil development feedbacks on soil formation.
Soil age is typically unmeasured in most geomorphological and pedological studies, limiting the applicability of the current model.Numerical age dating, e.g., cosmogenic radionuclides or optically stimulated luminescence, is expensive and requires time-consuming preparation to be broadly utilized and can be complicated by transport and burial histories of soil and sediment (Anderson et al., 1996;Bierman, 1994;Gosse and Phillips, 2001;Granger and Muzikar, 2001;Schaetzl and Anderson, 2005).Fortunately, relative-age dating methods using landscape position are easily utilized and can provide the necessary age constraint needed to make model predictions (Burke and Birkeland, 1979;Favilli et al., 2009;Huggett, 1998;Matthews and Shakesby, 1984;Nicholas and Butler, 1996;Schaetzl and Anderson, 2005).Age constraint may also be achieved using landscape or hillslope morphology derived from elevation transects or digital elevation models to estimate a "diffusivity age" for the soil (Hsu and Pelletier, 2004;Pelletier et al., 2006).
Global climate patterns have shifted dramatically over the last 65 Myr (Zachos et al., 2001).The majority of soils observed in the compiled chronosequence database span the Quaternary, including both the Holocene and Pleistocene.The Pleistocene was marked by a number of major glacialinterglacial cycles at approximately 100 000-year intervals (Imbrie et al., 1992;Wallace and Hobbs, 2006), which corresponded with shifting climatic conditions; e.g., for large portions of the northern midlatitudes glacial periods were generally cooler and wetter and interglacial periods were warmer and drier (Connin et al., 1998;Petit et al., 1999).Further, the Pleistocene climate shifts likely influenced the rates of weathering and clay production (Hotchkiss et al., 2000).Taking into account the differences in past and modern climate would partially reduce prediction errors between observed and modeled soil physical properties.Reconstructed global paleo-EEMT values would improve model accuracy and limit uncertainty in the probabilistic ranges of soil properties for soils older than Holocene age.

Conclusions
The present approach effectively predicts soil physical properties across a diverse range of geomorphic surfaces, lithologies, ecosystems, and climates.Further, this approach is mathematically simple and only requires knowledge of the probable age of a geomorphic surface and the effective energy and mass transfer value associated with a given location, making this approach universally applicable.The simplicity of the probabilistic approach lies in the lack of the need to consider the initial conditions of the soil-forming state or the processes driving soil property change.A probabilistic approach does not exactly predict a soil physical property value at a given location but constrains the probable values based upon the state of the external environment to the soil.Using probabilistic approaches, we can model probable soillandscape evolution scenarios, greatly informing our understanding of the evolution of critical zone structure.

Figure 1 .
Figure 1.Map of study sites.Yellow points indicate location of chronosequences, and red triangles indicate location of soils in complex terrain.

Figure 2 .
Figure 2. Marshall Gulch study site.(a) Location of the Santa Catalina Mountains and the Marshall Gulch catchment within Arizona, USA; (b) elevation of the granite sub-catchment of Marshall Gulch; (c) predicted soil depth in the granite sub-catchment (Pelletier and Rasmussen, 2009a); (d) EEMTv2.0 in the granite sub-catchment(Rasmussen et al., 2015); (e) mismatch between the measured soil depths and predicted soil depths.

Figure 4 .
Figure 4. Bivariate normal distribution between TPE and max clay content.The points indicate individual soils.The red ellipses represent lines of equal probability, which corresponds to a threedimensional probability distribution.From this relationship the conditional mean and variances for the soil physical properties were calculated.
Figure5.LOOCV results for max clay content.The results were subdivided by general soil parent material: igneous, metamorphic, and sedimentary; the points represent the geomorphic surface each soil formed on, and the colors represent the EEMT value for the location of each soil.Using LOOCV, where one chronosequence was removed from the model dataset and the remaining datasets were used to predict the parameters of the bivariate distributions, the conditional means of the left-out chronosequence was determined.The model was effectively able to predict the conditional mean values of the max clay contents with an r 2 = 0.54 (RMSE = 14.8 %).The model was least capable of predicting the clay contents on coral reef terraces (+) and appeared the most effective for alluvial surfaces ( ).

Figure 6 .
Figure 6.Selected relationships between the measured maximum clay content and predicted maximum clay content.(a) Harden (1987), (b) Howard et al. (1993), (c) Birkeland (1984), (d) Muhs (2001), (e) Huang et al. (2010), and (f) Merritts et al. (1991).The errors represent the conditional standard deviations around the mean, which correspond to a probability of 50 %.The model effectively predicted clay content across a diverse range of climates, landforms, and parent materials.The model was the least effective at predicting the clay content of soils in tropical climates and soils forming on coral reef terraces.

Figure 7 .
Figure 7. Model results in complex terrain.(a) Prediction of depthweighted (DWT) clay contents; (b) prediction of mass per area clay using Eq.(9).The model was incapable of directly predicting DWT clay for the soils in complex terrain due to redistributive hillslope processes; r 2 = 0.26 between measured and predicted conditional mean DWT clay (a).By including information about soil depth and percent volume rock fragment and converting DWT clay to mass per area clay, the model was significantly more effective at predicting clay contents for these soils r 2 = 0.81.

Figure 8 .
Figure 8. Model results of coupled geomorphic-EEMT-TPE model in Marshall Gulch granite sub-catchment.(a) Prediction of mass per area clay for sites from Holleran et al. (2015) and Lybrand and Rasmussen et al. (2015); (b) spatial prediction of mass per area clay.When combining the present approach, with a geomorphicbased soil depth model, the combined models together were highly effective at predicting the clay contents for a majority of soils in the Santa Catalina Mountains (Catalina-Jemez CZO), r 2 = 0.74.

Table 1 .
Complex terrain study sites and characteristics.

Table 2 .
Parameters for the bivariate normal probability distributions for the soil physical properties and TPE; n is number of profiles; µ is mean; σ is standard deviation; and ρ is Pearson's correlation between soil variables and total pedogenic energy.
a Square root transformed.b Log10 transformed.c For clay variables.d For sand and silt variables.e For solum thickness, max indicates maximum content; DWT indicates depth-weighted average content.

Table 3 .
Spearman rank correlations between soil physical properties and TPE and age.

Table 4 .
Sensitivity analysis of model prediction in complex terrain.