## Abstract

The wireline gamma-ray log is sensitive to open-hole conditions and, in particular, the diameter. This means that the log can jump at casing points. Although environmental corrections exist, they can fail at these points. We present a Bayesian method for deriving a new quantity – the shifted gamma–ray index – that takes these shifts into account by fitting a piecewise linear function to open-hole data in a depth window around the casing point. Because it is Bayesian, the method enables us to assess our uncertainty about its performance. This method requires very little knowledge of the borehole or drilling conditions but relies on the assumption that the lithology is consistent. Investigating the other wireline logs enables us to assess whether this assumption is valid. We demonstrate our method using well data from offshore mid-Norway.

## Introduction

In the petroleum industry, the gamma-ray (GR) log is used routinely to estimate the shale volume (*V*_{sh}), which is the percentage of total rock volume made up of shale. This quantity is particularly useful in a conventional pore-pressure prediction workflow, where a normal compaction trend will be defined using data from shales only, usually sonic transit time (DT), density (RHOB), resistivity (RES) or seismic interval velocity. For more detail on pore-pressure prediction methods see, for example, Zhang (2011) or Mouchet & Mitchell (1989).

Although the GR log is usually calibrated to a standard such as API units, its value depends on the position of the sensor, the dimensions of the borehole and various other factors, and so it is not necessarily a precise expression of the level of GR activity of the formation. The behaviour of the whole log is therefore used, often by standardizing the log to [0, 1], creating a quantity called the gamma-ray index, *I*_{GR}.

To be able to use the GR log to estimate the lithology, we must account for the effects of these other factors. The effects can be especially significant at casing points. At these depths, drilling is stopped and casing is run and cemented, hydraulically isolating the open formations from the previous casing point to the current depth. The lowermost casing section (the shoe) is set in non-reservoir rock, often a shale, prior to starting drilling again with a smaller bit size. Different logging equipment may be used from this point, and the type or properties of the drilling mud may also be changed. For these reasons, there is commonly a sharp change in the behaviour of the open-hole GR log at a casing point. We present a method for taking into account these larger-scale changes to create a new quantity, the shifted gamma-ray index, *S*_{GR}. Our approach is based on detecting a change in mean at a known casing point, guided by prior information as to the depth of the casing point and the surrounding lithology.

In the next section we discuss the GR log in more detail, and briefly mention some existing methods for dealing with its uncertain nature. Our *S*_{GR} method is introduced in the section on ‘The shifted gamma-ray index’ later in this paper. This method makes use of the Gibbs sampler (a Bayesian sampling method), introduced in Appendix A. The subsection on ‘Fitting a piecewise linear function’ explains the probability model behind our method. In the section ‘Validating our assumptions’, we introduce some methods for validating the assumptions underlying the shifted GR index. In the section ‘Example: offshore mid-Norway’, we demonstrate the methodology using data from this region.

Ideally, wireline logs provided as digital data files will already have had environmental corrections made to them (Serra & Serra 2004) and, therefore, reapplying environmental corrections would be inappropriate. Information about environmental corrections, especially regarding the GR, is often lacking in the log header and so the recipient must seek a data-driven method to correct for shifts across casing points. The shifted GR method we present uses the data themselves to create a consistent GR index and is, therefore, of use in such situations. In the section on the example of offshore mid-Norway, we use wireline log data from digital LAS (log ASCII standard) files, not knowing whether the GR log has been corrected or not. Because the log continues through the casing changes with no break or duplication, it is clear that some data processing has taken place.

## The Gamma-Ray Log

The GR log records the natural gamma radiation from radioactive isotopes of uranium, thorium and potassium, and their daughter isotopes, present in the formation, most frequently in clay minerals. This log is almost always run on all wireline and logging while drilling (LWD) tools for lithology interpretation and correlation between logging runs. This paper focuses on the wireline GR data in open-hole conditions, as opposed to the behaviour of the GR log as it goes from open hole to cased hole. The spectral GR log gives a separate value for each decay series but the total count GR log, which will be our focus, gives their combined radioactivity. This quantity has various uses in petroleum geology – for example, in sedimentology and in correlations between wells – but our concern is its use in pore-pressure prediction. Different types of rock are radioactive to different degrees. Among sedimentary rocks, shales tend to have the highest levels of radioactivity, so the GR is often used to interpret shale from other lithology (Rider 1990; Thibal *et al*. 1999; Serra & Serra 2004).

The gamma ray index (*I*_{GR}), defined as:

is suggested by Asquith & Krygowski (2004) as a primitive but simple value for the shale volume. This relies on the assumption that, although the GR values vary for different shales, within any particular well one would expect the GR value of a pure shale to be constant (Rider 1996). Setting *V*_{sh} = *I*_{GR} also implies that the shale content is proportional to the GR emissions, which may not be the case. Some non-linear calibration curves defining *V*_{sh} as a convex function of *I*_{GR} have been developed for formations of different ages (Bhuyan & Passey 1994; Rider 1996), and other relationships have also been proposed for *V*_{sh}.

In many standard pore-pressure prediction workflows (e.g. Eaton 1975), a normal compaction trend must be defined, showing the decrease in shale porosity with depth under normal compaction where the pore pressure is hydrostatic. The porosity in the formation is then compared to this curve to estimate pore pressure (Mouchet & Mitchell 1989). Since the porosity of a shale cannot easily be measured directly, data such as compressional sonic transit time (DT), deep resistivity (RES) and formation density (RHOB), as well as seismic interval velocity, are used as proxies for porosity. Where, for instance, high sonic transit times are measured by a wireline tool relative to the compaction trend, anomalous pore pressure is inferred.

The normal compaction trend is based on the assumption that shale lithology is consistent to enable the selection of shale intervals from the wireline log data (Swarbrick 2002). The shale volume is usually used to do this, and so estimating *V*_{sh} is a critical part of the shale pore-pressure prediction workflow. Thus, we must understand the GR log, where this is used as the primary shale indicator.

The GR log is very sensitive to borehole conditions and tool configuration (Maučec *et al*. 2009; Mendoza *et al*. 2006), and, most of all, to borehole diameter (Chen 1998). This can create large jumps in the GR log, particularly at casing points where the borehole diameter changes suddenly, leading to a misrepresentation of shale in the rock column, and potential inaccuracy in the final pressure prediction model. If too much shale is inferred then data that represent sands, for instance, may be incorporated in defining the normal compaction trend. This would lead to an overestimation of the pore pressure, as the normal compaction curve would plot at a faster velocity. Conversely, if too little shale is modelled, then some key shale intervals may be assigned as sand and not used, and so degrade the definition of the compaction model. The variation in the GR from changes in these conditions can be far greater than the variation owing to changes in lithology (Bristow & Williamson 1998). Before the GR log can be used to interpret the lithology, therefore, this issue must be addressed.

We review existing methods for taking into account uncertainty in the GR log before introducing our own Bayesian method, the shifted GR index.

### Existing methods for correcting the GR log

Methods exist for correcting the gamma-ray log to account for specific conditions but they are mostly empirically derived, and include variables that may not be known. Serra & Serra (2004) gave correction formulae for different Schlumberger logging tools and different borehole conditions, which include parameters relating to the casing (if present), pore fluid, mud properties, tool position within the borehole and borehole diameter. These parameters are used to calculate a parameter, *t*, and the correction factor is read from a plot showing correction factor against *t* for different tools, positions within the borehole (centred and eccentred) and drilling mud types. Note that since the shifted GR method concerns only open-hole data, casing and cement effects are irrelevant.

Lehmann (2010) collected together GR correction formulae that are used in the uranium mining industry. Each of these equations calculates a scale factor, *Y*, which is then applied to the raw data, such that GR_{cor} = *Y* GR_{raw}, where GR_{cor} is the corrected GR value. Different formulae are presented for *Y*, depending on whether or not there is casing, whether the pore fluid is water or oil, and whether the GR sensor is centred or eccentred. The equations for *Y* involve the borehole fluid density and the borehole diameter, and each contains several empirically derived constants. Stromswold & Wilson (1981) also presented correction formulae for the GR log, taking into account the variation in borehole fluid and diameter, and the presence of steel casing. Weatherford (2009) presented GR corrections as graphs.

Although philosophically it is appealing to derive equations using physical reasoning, the exact values of the borehole geometry, drilling conditions, tool position and fluid properties may not be known, and their effects will not necessarily be as simple as assumed in the derivation of such equations. Furthermore, the correction methods discussed aim to realign the GR log to calibrated data (Stromswold & Wilson 1981; Lehmann 2010). The method we present in the following section aims simply to realign the GR log so that it is self-consistent and suitable for identifying zones of similar shale content to create normal compaction curves.

Since we do not know the tool type, position or diameter for the data used in the example, or, indeed, whether the data supplied have already been environmentally corrected as expected, we are unable to use the methods mentioned above to form a comparison. Because steps in the GR log response across casing points remain, even if environmental corrections have been applied, some form of shift correction is still necessary for consistency in zonal identification.

## The Shifted Gamma-Ray Index

We propose a method that requires very little knowledge of the borehole conditions but, instead, relies on some assumptions. For the method to be useful, these assumptions need to be reasonable, and we will look into ways of verifying them using other wireline logs.

The aim of this method is to account for the effect on the GR log of changes in borehole diameter and, possibly, in drilling mud and logging tool calibration that take place at casing points. The main underlying assumption is that the casing is set in a massive shale. This seems reasonable as Leak-Off Tests, which can only be performed in rocks of low porosity and permeability, are often performed just below the casing shoe (Mouchet & Mitchell 1989), and it is desirable to change casing in a competent shale (Devereux 1998). In the well-planning stages, casing point selection can be informed by seismic data (Littleton *et al*. 2002) or through data from offset wells (Clouzea *et al*. 1998).

However, the assumption of a homogeneous shale could be verified by checking the mud log and using other wireline logs, such as RHOB and neutron porosity (NPHI), that are sensitive to lithology but less affected by casing points. The RHOB log is sensitive to borehole rugosity but is always accompanied by a caliper log and a delta-RHOB, both of which are useful for assessing the quality of the measurement. If NPHI is available, then NPHI and RHOB can be used to interpret lithology, either to confirm or to contradict the assumption of a homogeneous shale. We will demonstrate some approaches in the section on ‘Validating our assumptions’ later.

At each casing point (at depth *z*_{cas}), we fit a piecewise linear function like that in Figure 1 to the gamma-ray data.

This is a simple model we use to estimate the effect of the casing point on the GR log, and the form was decided after studying GR logs that showed clear signs of being affected by casing points. Such signs can be seen in the data used for the examples, where there are shifts in the GR logs around casing points in three offshore mid-Norway wells. The important features of the fitted function are the values of the open-hole GR log before and after the casing point, and the transition zone: the region between *z*_{dtop} and *z*_{dbot}, where the GR log is not at either stable level. The transition zone was introduced rather than an abrupt jump from one level to the next because, in most cases, the data showed a gradual change over the casing point from one level to another, although this zone may often be very narrow. This may be a shoulder effect, as described by Rider (1996), or the result of data processing such as splicing or depth matching the different log runs. Generally, the casing shoe is set a few metres above current bottomhole depth, leaving a small section of open hole often called the ‘rathole’. The next section of borehole is then drilled with a smaller bit size. The transition zone should, therefore, cover this small section, the ‘rathole’, and only settle at a constant rate once the borehole is at the smaller diameter.

Theoretically, one could rescale the data in the transition zone using the fitted curve; however, there are several problems with this. The transition-zone part of the fitted curve has its own variance ( in what follows), which may be large if the data in the transition zone are highly variable or non-linear. This is a desirable property, allowing flexibility between the two constant portions of the curve whilst maintaining simplicity in the model. However, the poor fit of the curve would be likely to make it inadequate to transform the data. It is also common for there to be caving in the borehole immediately below the casing point, and this would cause the data to be of poor quality. One could investigate the caliper log to see whether this was the case. Instead, we use the mean GR found by the model to connect the two constant portions.

The user is able to restrict the width of the transition zone, and also to assess whether the model fit is appropriate. It should be emphasized that this method involves subjective choices to be made by the user. The nature of these choices is made clear in the following subsection (‘Fitting a piecewise linear function’), and some examples of how the results depend on of the user’s specifications are shown in the section ‘ Example: offshore mid-Norway’.

The curve begins at *z = z*_{top}, and has a constant value
until *z = z*_{dtop}. From *z = z*_{dbot} to *z = z*_{bot}, the curve has constant value *γ*_{1}+*θ*. For
, the function changes linearly from *γ*_{1} to *γ*_{1}+*θ*. We refer to this region as the ‘transition zone’. Figure 2 shows an example using some real data.

In order to fit this curve to real data, we must estimate the parameters *γ*_{1}, *θ, z*_{top} and *z*_{dbot}. The depths *z*_{top} and *z*_{bot} will be fixed but should not be outside the shale interval, and *z*_{cas} will be given. We use the depth of the casing shoe for *z*_{cas}, as this is the one most often given in well reports, although it is likely to be slightly shallower than the depth at which the borehole diameter changes. This is apparent in the examples, as *z*_{cas} is often very near the start of the transition zone.

In allowing a transition zone of uncertain width, this model extends from the problem of estimating the mean shift in a known-changepoint time series. Krishnaiah & Miao (1988) reviewed least-squares regression, maximum-likelihood and Bayesian approaches to that problem. For our extension, we choose a Bayesian approach as it is straightforward to apply, appears to work well and offers probabilistic updates of uncertainty for all parameters.

### Fitting a piecewise linear function

The model described here is for the data around one casing point, at depth *z*_{cas}.

The algorithm requires some starting values:

*d*_{min}: the minimum length of the constant sections (i.e. of*z*_{dtop}*–z*_{top}and*z*_{bot}–*z*_{dbot}), which restricts the width of the transition zone;*d*_{max}: the distance from*z*_{cas}to*z*_{top}and*z*_{bot}(assumed to be the same);*a, b, µ*_{p}: hyperparameters for the prior distributions below.

The value of *d*_{max} should be chosen to cover an interval above and below the casing point in which the GR log appears to be approximately constant. In the examples that follow we used *d*_{max} = 40 m unless otherwise stated. The parameter *d*_{min} broadly controls how many observations are used to estimate the means for the constant segments preceding and succeeding the casing point. After some experimentation, we found that a value of 3 m worked well for all of our examples. A reasonable starting choice for the parameter *µ*_{p} is the mean of the GR log between *d*_{min} and *z*_{cas}. In practice, the results we obtained were not sensitive to the value of *µ*_{p} when it was varied between 0 and 100. We used the values *a* = 1 and *b* = 1 for all examples, to provide a non-informative prior distribution (Carlin & Louis 2009).

We first establish a model for the likelihood:

This stipulates that the GR data will be normally distributed about the curve we have described, and that the variances in the different sections (*z*_{top} to *z*_{dtop}, *z*_{dtop} to *z*_{dbot} and *z*_{dbot} to *z*_{bot}) may differ from one another. These variances are given by the precisions
and
, and are considered unknown. Prior distributions must also be specified, and we use:

The top two distributions give equal probability to every permissible depth for *z*_{dtop} and *z*_{dbot}. The constants *k*_{1} and *k*_{2} are such that the probabilities sum to 1. The precisions
and
are forced to be positive by a Gamma distribution, which has density function:

where the centre and spread are determined by the hyperparameters *a* and *b*. A prior mean must be given for *γ*_{1} but, *a priori*, we expect *θ* = 0. This would be simple to change.

Our aim is to combine data with this probability model to find the posterior probability distributions of the parameters (i.e. the distribution conditional on the data). Analytically, this would be impossible, and so we use a Bayesian sampling tool, the Gibbs sampler, which we describe in Appendix A. The Gibbs sampler combines our prior model with the data (gamma-ray data around a casing point) to produce samples from the posterior distributions of the model parameters *γ*_{1}, *θ, z*_{dtop}, *z*_{dbot}, *τ _{v}, τ_{w}* and

*τ*.

_{z}At each casing point, the data in the transition zone are removed, and the GR log below the transition zone is shifted by the estimate of *θ*. Finally, as the absolute value of *θ* is arbitrary, we find *S*_{GR} by rescaling it to the interval [0, 1], so that *S*_{GR} is analogous to *I*_{GR}.

Mathematically, this is no different from varying GR_{sand} and GR_{shale} above and below each casing point, which is common practice, but this method is not prone to arbitrary choices of the values and enables uncertainty analysis on the resulting log.

It may seem appealing to have a multiplicative element to the model. Indeed, many of the correction formulae given by Lehmann (2010), Weatherford (2009) and others are multiplicative. With this in mind, a multiplicative probability model was specified, where the constant part of the curve from *z*_{dbot} to *z*_{bot} was *αγ*_{1}+*θ*. In this data-based approach, the only quantity with which we can estimate the scaling parameter *α* is the variance of the GR log. Therefore, rather than having three separate and independent variances, as in the probability model above, there was one variance, τ _{-1}, which was then scaled by *α*.

When this model was applied to real data it was clear that too much weight was being given to the variance, with previously significant features of the GR logs being lost. The additional parameter also made convergence of the Gibbs sampler much slower.

As stated in the Introduction, the wireline logs in the digital data files should have already have had environmental corrections made to them. In the case of the GR log, these would be likely to be multiplicative as we have seen.

## Validating Our Assumptions

According to Rider (1996), RHOB, DT, NPHI and photo-electric effect (PEF) logs can all be used quantitatively (or semi-quantitatively) to interpret lithology. Therefore, we will use these logs, where available, to validate (or expose as false) our assumption that the lithology around the casing point is homogeneous. The GR log is among the most available of the wireline logs, and is usually recorded over almost the entire well, along with the resistivity, sonic transit time and caliper logs. The caliper log, in particular, is useful for quality control since if it shows rugosity in the borehole then the other wireline logs are likely to be affected. Neutron porosity and density logs are typically run over shorter intervals around the reservoir. It may, therefore, be that over some casing points we have a limited selection of logs to study.

The first step will be to confirm that the casing point does not affect each log. We will do this by using the same function-fitting procedure as for the GR, except that now our concern is whether the posterior estimates for *θ* differ significantly from 0.

Secondly, assuming no casing point effect has been found, we must use the logs to interpret the lithology around the casing point. Hearst *et al*. (2000) suggested cross-plotting NPHI against RHOB. Where these logs are available we will do this, to check that all logged depth points around the casing shoe lie in the same lithological region of the NPHI–RHOB cross-plot. We will use some of these methods on real data in the example.

## Example: Offshore Mid-Norway

To demonstrate *S*_{GR} in practice, we will use some wells from offshore mid-Norway, the GR data and casing points from which are shown in Figure 3. Following this, we will use the other logs, as described in the ‘Validating our assumptions’ section, to support our use of *S*_{GR}.

We do not know what environmental corrections, if any, have already been made on these GR logs, and this gives another reason for preferring our data-based approach to a deterministic one. In this section, we demonstrate the shifted GR method, and following this we go on to use other available wireline log data to validate the assumption that the casing change is in a consistent shale, as described in the section on ‘Validating our assumptions’.

### Well 6406/1-1

For well 6406/1-1 there are two casing points where we have sufficient GR data, and these are at 3008 and 4611 m RKB. Figure 4 shows samples from the posterior distribution at these points. The input parameter values are as given in the subsection on ‘Fitting a piecewise linear function’, and three parallel chains were used in the Gibbs sampler.

The left-hand plots in each case show the GR data (black points), the casing point depth (vertical black line) and a random sample of 100 fitted curves from the 3000 samples of the posterior distribution given by the Gibbs sampler. We ran three parallel chains (see Appendix A) in each case, giving three independent estimates of all model parameters. The density plots on the right in Figure 4 show the posterior densities for the shift parameter *θ*, with different curves indicating the three independent estimates. Table 1 gives summaries of the posterior distribution of each parameter.

In all cases the model is clearly working in terms of finding suitable fits to the data. The posterior distributions for each chain, shown by different lines in the density plots, match fairly well, giving us confidence that the Gibbs sampler has converged.

The method can be sensitive to the values of *d*_{max}, *d*_{min}, *a, b* and *μ*_{p}. For example, Figure 5 shows the fitted functions for two different values of *d*_{max}. The light grey lines were found using *d*_{max} *=* 40, as in Figure 4a. The dark grey lines were found by reducing *d*_{max} to 20. This removes most of the GR log below *z*_{cas} that is at the stable level, and so the model very easily fits the level using data that should be part of the transition zone. The end of well report shows that the mud type was changed at 3016 m, and this can sometimes cause a jump in the GR log, as it appears to here.

If the number of iterations were high enough, all three chains would converge to the same distribution but this early result indicates that this distribution would not be appropriate.

Figure 6 allows us to compare the final result (Fig. 6b) with the input GR (Fig. 6a). Figure 6a also shows the lithology data available, with horizontal dark grey lines denoting shale and light grey denoting sandstone. The lithological markers were obtained mostly from cutting samples and, while there may therefore be some error in their depths, the shales appear consistent. Again, we are reassured that the casing points are in shales.

### Well 6506/11-7

For well 6506/11-7 there are three casing points where we have sufficient GR data, and these are at 1387, 2700 and 4312 m RKB. The GR log behaviour at 2700 m is somewhat surprising, as ordinarily one would expect an increase in GR with a smaller borehole diameter. However, the mud log shows that the mud type was changed at 2710 m, from a KCI/polymer/glycol mud to a Versapro mineral oil-based mud.

Figure 7 shows the posterior samples for the fitted functions (left) and posterior densities for *θ* (right) at each of these casing points. Table 2 summarizes the posterior distributions for each parameter at each casing point.

Again, the Gibbs sampler chains appear to have converged, and the fitted curves appear to fit the GR data well.

Figure 8 demonstrates another feature of this shifted GR method, which is that it will not necessarily produce distributions that focus on one curve. Here, the posterior distribution appears to favour two different values of *z*_{dbot} more or less equally. For this reason, the posterior mean for *z*_{dbot} in Table 2 lies between the two values, and the SD and range are large.

This is not necessarily a problem, particularly since here it results in only a slight difference in the width of the transition zone (the posterior values for *γ*_{1} and *θ* are still very similar). If the difference was more serious, for example in the value of *θ*, then one could look further into the other wireline logs and the well report to try to understand whether one solution is more realistic.

Figure 9 compares the input GR (left, with lithology data also shown) with the mean *S*_{GR} found using our method.

### Well 6406/2-7

For well 6406/2-7 there are four casing changes with GR data, at 379, 1399, 2707 and 4458 m. The posterior function fits are shown in Figure 10, along with posterior densities for *θ*. Table 3 summarizes the posterior densities for each parameter at each casing point.

Some of the shifts in GR are much more gradual in this well, particularly the shift at 379 m. Figure 11 compares the smoothed GR data with the mean *S*_{GR}, this time without lithology data.

### Example: Validating our assumptions

To verify that the shifted GR model is appropriate, we use some of the approaches mentioned in the earlier ‘Validating our assumptions’ section on the data from the three mid-Norway offshore wells to see whether the lithology around each casing change is consistent. There is often little choice of wireline log in the shallower sections of the well, and so the choice of logs depends on which logs are available.

#### Well 6406/1-1

The only other wireline log available around 3008 m is resistivity. Hearst *et al*. (2000) stated that resistivity does change with lithology, even though this change is complicated by other factors, such as fluid content. Figure 12 shows the resistivity around 3008 m, with a fitted linear model of resistivity against depth shown in blue. The gradient coefficient in the linear regression of resistivity against depth is estimated to be −2.8 × 10^{-4} ohm m m^{−1}, which is not significant under the linear regression model (the *t* statistic is −1.7, which has a *p* value of 0.09). This supports the notion that the lithology is consistent, and the resistivity values themselves are consistent with shale.

Figure 13 shows the density log around the casing change at 4611 m in well 6406/1-1, with the posterior samples of the fitted model function, and the posterior samples of *θ*.

The density log does not show any sign of change over this region and, again, this supports a consistent lithology.

#### Well 6506/11-7

Most of the wireline logs were recorded only below the casing point at 1387 m, so, again, we have only resistivity for the shallowest casing point. Figure 14 shows the resistivity around the casing point, with a linear regression fit in blue. Again, the coefficient of depth in a linear regression of resistivity against depth is non-significant (the *t* statistic is −1.6, which has a *p* value of 0.11).

Figure 15 shows the neutron porosity (in limestone units) and density logs around the casing point at 4312 m separately, and both show slight trends with depth.

Figure 16 shows the neutron porosity–density combination plot around the casing change at 4312 m in well 6506/11-7, as in Rider (1996). The alignment of the scales results in both curves overlapping in water-filled limestone porosity. Below 4292 m, the data appear to be shale as indicated by the characteristic separation between the curves, with NPHI reading a considerably higher porosity than RHOB.

#### Well 6406/2-7

Again, for the casing changes at 379 and 1399 m there are few wireline logs but below this depth both neutron porosity and density are available. Figure 17 shows the neutron porosity and density plotted against depth around the casing points at 2707 and 4458 m.

Although the data are not in the same regions of the plots, the separations between the logs are consistent with shale, and show similar lithology above and below the casing point.

## Discussion

The problem that we have addressed is that of detecting a shift in mean response at a specified casing point. In the statistical literature, this is a changepoint problem (Krishnaiah & Miao 1988), with the added complication that in all the examples we have seen that the change in shift is not instant but takes place over an interval or transition zone. These zones vary in length from example to example. Moreover, the observations within the transition zone tend to transit to the new level unpredictably, and with greater variability than the preceding and succeeding observations. The typical industry approach to this problem is to make a ‘by eye’ adjustment for shift and to ignore the transition zone. The method described here is a statistical formalization of that process.

We adopt a Bayesian approach as the most appropriate, extending from the changepoint problem discussed by Carlin *et al*. (1992), who also gave a brief introduction to the Gibbs sampler. One advantage of the Bayesian approach is that it can incorporate expert judgements as to the process – for example, in the depth of the transition zone. Another advantage is that it can handle uncertainties for the different kinds of quantities involved: depth of transition zone, variation in mean shift and so forth. The results produced are posterior probability distributions for these quantities.

The Bayesian approach requires parameters to control the estimation process. These can be chosen subjectively according to the problem at hand but we also suggest defaults that we have found to work well in practice. Typically, the defaults are based on non-informative prior distributions, where we would expect data to dominate the prior formulation very quickly.

The use of a Markov Chain Monte Carlo mechanism such as the Gibbs sampler, as described in Appendix A, is standard in such settings. The algorithm we present is implemented in the statistical language R (R Development Core Team 2011), making use of R packages rjags (Plummer 2014) and R2Jags (Su & Yajima 2014). All of our examples run automatically with default parameters and take a few minutes each on a desktop PC. In addition to providing a formal statistical basis for the alignment of GR segments between casing points, this method should provide results that are faster, more reliable, more automatic and less biased than those produced by manual manipulations of the GR log. Further, the procedure is transparent, and with assumptions that may be challenged and pursued via diagnostics at a finer level of detail, if so desired. We would expect such a tool to be used with expert oversight rather than to provide unchecked fully automated realignments.

Except diagnostically, we do not presently make use of uncertainties attached to the mean shift or to segments joining the casing points. However, elsewhere we are constructing stochastic dynamic Bayesian networks that model pore pressure layer by layer as we descend through the earth. The shifted GR index is an important ingredient to this modelling process. The availability of uncertainties attached to features in the shifted GR index is essential in that context.

## Summary

We have presented a Bayesian, data-based approach for accounting for uncertainty in the open-hole GR log around casing changes. This method relies on the assumption that the lithology around the casing point is consistent, and we have shown some methods for validating this using other wireline logs. The shifted GR method does not rely on any parameters relating to borehole conditions, mud type and so on, but does require careful consideration from the user. Using the posterior distributions obtained through the probability model, we are able to asses our uncertainty about the fit (given the model). Although the mathematics is unavoidably complex, they can be implemented in software, and an R package is in development for release to the community.

## Acknowledgments

With thanks to the sponsors of the GeoPOP3 project, BG, BP, Chevron, ConocoPhillips, DONG Energy, E.ON, Eni, Petrobras, Petronas, Statoil, Total and Tullow Oil, for financial support, and also to our GeoPOP3 colleagues for their advice and support. We are most grateful to a referee and the editor for comments that allowed us to clarify aspects of the paper.

## Appendix A: The Gibbs sampler

When trying to solve a problem in the Bayesian framework, the desired end result is usually a marginal posterior distribution. For instance, in the GR shift problem described in this report, we would like to know the marginal posterior distribution for *θ*, the probability of *θ* conditional on the data. Finding this distribution analytically would involve some prohibitively complicated integration. The Gibbs sampler provides a way to obtain random samples from posterior marginal distributions without ever having to solve these integrals, by sampling from the conditional distributions.

The Gibbs sampler has been popular since the 1980s, and has been used in a wide array of problems (Gelfand *et al*. 1992). It is an iterative sampling scheme and proceeds in the following way. We write
to denote the collection of all of the model parameters, and *D* for the data. We start with an arbitrary set of values
. We then draw
from the distribution:

and then :

and so on until we have drawn from:

This gives us a new set of values **Θ**^{(1)}. We repeat this process, in general sampling
from
until we have several thousand or more such **Θ**^{(j)}. Under fairly general conditions, the distribution of these samples tends towards the true posterior distribution. Although eventual convergence can be proved, it is not possible to calculate how many iterations there should be before it is reached. Therefore, several precautions are often taken (Carlin & Louis 2009).

**Burn-in**: the first*m*sets are discarded, to remove dependence on the initial value**Θ**^{(0)}.**Parallel chains**: several independent sequences are generated, rather than just one. In practice, this means re-running the full estimation procedure several times. If the results have very similar properties, this suggests the samples can be trusted.**Thinning**: this deals with the problem of the chain getting stuck in particular areas, and not exploring the full probability space properly, which is particularly a problem where there are local maxima separated by regions of very low probability. The sequence of values**Θ**^{(j)}is ‘thinned’ by keeping only every*n*th set.

Once the final collection of samples has been found, samples of the marginal posteriors are given by the values of each parameter. Population properties can be estimated using the samples; for example, the mean of *θ*_{1} can be approximated by the mean of the sample values of *θ*_{1}, and so on. More details of the mathematics behind the Gibbs sampler are given in Casella & George (1992), and Gelfand *et al*. (1992) give an example. The problem we describe is very similar to the changepoint problem discussed by Carlin *et al*. (1992), who also give a brief introduction to the Gibbs sampler.

- © 2014 EAGE/The Geological Society of London