1From A Posterior with Gaussian Distributions to an least-squares Objective-Function¶
In this appendix, we develop the operations necessary to go from a posterior distribution formulation to an objective function in the case of the Tikhonov inverse problem.
We stated in section 2 that a least-squares term corresponds to a Gaussian prior. For a vector of parameters, the multivariate Gaussian distribution, denoted by N, with mean μ and covariance Σ (denoted by σ2 for a unidimensional Gaussian, with σ being the standard-deviation) is:
Going from a posterior distribution to an objective function is done by taking the negative natural logarithm of the posterior. Taking the negative logarithm of a single Gaussian term gives:
The first term obtained in equation 2 is recognizable as a least-squares misfit. The second term does not depend on m and is thus a constant. As constants do not play any role in the optimization, it is thus discarded in the objective function.
For the full Tikhonov geophysical objective function P(m∣dobs), the full posterior distributions can be written, using the Bayes rules, as:
where the finite difference operator L summarizes the first or second derivatives in all directions with their respective {α} scale. We respectively express the data misfit, the smallness and smoothness probability functions as Gaussian distributions:
The objective function is obtained by applying the negative natural logarithm to the posterior distribution described in equation 3. The summation form is simply a consequence of the fundamental property of the logarithm function, the multiplication becomes an addition in this new space. It follows:
The constant term in equation 9 contains the constant terms for each Gaussian distribution and the constant term log(P(dobs)).
This completes the detailed operation to go from a posterior distribution to an objective function formulation of the Tikhonov inverse problem.
2Conjugate prior versus semi-conjugate prior for means and variances prior information for a single univariate Gaussian Distribution¶
In this section, we elaborate on the difference between a full and a semi-conjugate prior approach for estimating the parameters of a single, one dimensional, Gaussian distribution. After giving the needed definitions for the full conjugate prior and how the MAP-EM updates are affected, we illustrate the difference between semi and full conjugate through the example displayed in Fig. 1.
We first need some definitions. The full conjugate prior for the mean and variance follows a Normal-Inverse-Gamma distribution Murphy, 2012. As in section 3.2, we use the Normal-Inverse Chi-Squared re-parameterization:
The update of the means in MAP-EM algorithm stays the same as for the semi-conjugate prior approach, as detailed in equation 32. On the contrary the update of the variances now requires an additional term sμˉj(k) to account for the difference between the observed and the prior means:
We illustrate those concepts with Fig. 1. Let us consider the following setup. We have a set of observations (in our framework, this was our geophysical model), represented through their histogram in blue. The Gaussian distribution in blue represents the MLE parameters (estimated without any prior information). We now add prior information in the form of a Gaussian distribution, in gray in the Fig. 1. We set the confidence parameters κ and ν to unity, meaning that we have equal confidence in the observations and the prior (ζ is irrelevant here as we have only one Gaussian distribution). This equal confidence can be represented by having an equal number of samples from the prior than in the observed set. The histogram of this prior synthetic samples set set is shown in gray.
The full conjugate prior approach can be understood as fitting a Gaussian distribution on the dataset formed by merging the observed and synthetic observations; this is represented in Fig. 1 in red. The full conjugate MAP distribution, also in red, is well centered between the two observed and prior distribution as expected, and so is the semi-conjugate distribution (in black). However the variance of the red histogram, and thus of the posterior distribution with full conjugate prior, is higher than the variance of either the observed or prior distribution. This is due to the difference in the means of the two distributions, which the full conjugate prior approach accounts for (see equations 12 and 14). The semi-conjugate prior approach considers the means and variances independently (see equations 30 and 31). The MAP mean estimates are the same for both priors.
The semi-conjugate prior approach seems a better choice in the context of geophysical inversion. As our goal with this framework is to differentiate various geological units, by guiding the geophysical model to reproduce certain features, the full conjugate prior approach is sometimes detrimental as it can drive the variance further from the prior than what is currently seen in the model at each iteration.
We give in algorithm 1 the details of our implementation of this framework. The optimization notions of inexact Gauss-Newton and backtracking line search are detailed in Ascher & Greif (2011), or in Haber (2014) for their geophysical applications.
Figure 2 graphically summarizes in a flowchart how our framework loops over the various datasets to produce a final geophysical model with the desired petrophysical distribution and geological features.