## Abstract

Conventional simulation of fractured carbonate reservoirs is computationally expensive because of the multiscale heterogeneities and fracture–matrix transfer mechanisms that must be taken into account using numerical transfer functions and/or detailed models with a large number of simulation grid cells. The computational requirement increases significantly when multiple simulation runs are required for sensitivity analysis, uncertainty quantification and optimization. This can be prohibitive, especially for giant carbonate reservoirs. Yet, sensitivity analysis, uncertainty quantification and optimization are particularly important to analyse, determine and rank the impact of geological and engineering parameters on the economics and sustainability of different Enhanced Oil Recovery (EOR) techniques.

We use experimental design to set up multiple simulations of a high-resolution model of a Jurassic carbonate ramp, which is an analogue for the highly prolific reservoirs of the Arab D Formation in Qatar. We consider CO_{2} water-alternating-gas (WAG) injection, which is a successful EOR method for carbonate reservoirs. The simulations are employed as a basis for generating data-driven surrogate models using polynomial regression and polynomial chaos expansion. Furthermore, the surrogates are validated by comparing surrogate predictions with results from numerical simulation and estimating goodness-of-fit measures.

In the current work, parameter uncertainties affecting WAG modelling in fractured carbonates are evaluated, including fracture network properties, wettability and fault transmissibility. The results enable us to adequately explore the parameter space, and to quantify and rank the interrelated effect of uncertain model parameters on CO_{2} WAG efficiency. The results highlight the first-order impact of the fracture network properties and wettability on hydrocarbon recovery and CO_{2} utilization during WAG injection. In addition, the surrogate models enable us to calculate quick estimates of probabilistic uncertainty and to rapidly optimize WAG injection, while achieving significant computational speed-up compared with the conventional simulation framework.

Carbonate reservoirs contain a significant proportion of the world's conventional and unconventional hydrocarbon resources, commonly estimated at around 60% of global reserves (Burchette 2012). Hydrocarbon recovery in carbonates, however, is typically low, owing to multiscale heterogeneities and oil- to mixed-wet rock properties (Manrique *et al.* 2007; Mohan *et al.* 2011). Low recovery factors can further be influenced by complex connected high-permeability fracture networks that may establish preferential flow paths in the reservoir (e.g. Bourbiaux *et al.* 2002; Spence *et al.* 2014). The variability in matrix architecture and fracture network connectivity is the main reason why fractured carbonate reservoirs show a large variety of flow behaviours, leading to significant uncertainties in their evaluation, performance prediction and management (e.g. Mäkel 2007; Agada *et al.* 2016).

To account for multiple geological and engineering uncertainties, a large number of numerical reservoir simulations are typically required to adequately explore the parameter space, investigate parameter relationships and optimize hydrocarbon recovery. Sensitivity analysis, uncertainty quantification and recovery optimization for fractured carbonate reservoirs, however, are computationally expensive because of the multiscale heterogeneities and fracture–matrix transfer mechanisms that must be taken into account using numerical transfer functions and/or detailed models with a large number of simulation grid cells. This is particularly important for CO_{2} water-alternating-gas (WAG) injection, a successful EOR method for carbonate reservoirs that combines the benefits of gas injection to reduce the residual oil saturation with water injection to improve mobility control and frontal stability (Christensen *et al.* 2001; Manrique *et al.* 2007).

One efficient way of reducing the computational cost is by using data-driven surrogate modelling techniques that construct an approximation (or proxy) of the simulation response based on a limited number of simulation runs (Queipo *et al.* 2005; Simpson *et al.* 2008; Oladyshkin *et al.* 2011; Gogu & Passieux 2013). The modelling process typically involves generating an initial surrogate model with a set of full-physics training simulations. Subsequently, an approximate solution to the objective function is obtained by evaluating the data-driven surrogate. For validation purposes, approximate solutions from the data-driven surrogate are compared to model predictions using full-physics simulation (e.g. black oil or compositional simulation). If the comparison shows a mismatch, the data-driven surrogate is iteratively updated with more training runs and testing points added until the mismatch is eliminated.

In the context of EOR in fractured carbonate reservoirs, data-driven surrogates may be able to provide good approximations of time-consuming numerical simulations. The surrogate models can then help in understanding the respective dependencies and correlations of uncertain input parameters, and contribute to rapid simulation, optimization and decision making under uncertainty. Geological parameter uncertainties that affect CO_{2} WAG injection include the nature and flow significance of faults and subseismic fractures (Bourbiaux *et al.* 2002; Spence *et al.* 2014), and the role of wettability and hysteresis when controlling imbibition and drainage in the rock matrix (Larsen & Skauge 1998; Hui & Blunt 2000; Juanes *et al.* 2006). Similarly, engineering parameter uncertainties include WAG design parameters such as the flow rate and location of wells, WAG slug sizes, and WAG injection ratios.

The current paper presents results of a synergy between the design of experiments, data-driven surrogates and optimization under uncertainty. The novelty of our work is the synergistic application of the aforementioned approaches to EOR simulation and optimization for heterogeneous fractured carbonate reservoirs. Although the specific experimental design techniques (i.e. Box-Behnken, Latin Hypercube) and optimization algorithm (i.e. genetic algorithm) are not new, the application of the experimental design–surrogate workflow to the modelling of fractured carbonate reservoirs has not been previously reported. A brief overview of the state of the art for experimental design, data-driven surrogates from polynomial chaos expansion and optimization is presented in the following subsections on ‘Design of experiments’, ‘Polynomial chaos expansion’ and ‘Optimization’.

### Design of experiments

Design of experiments (DOE) is commonly used for extensive exploration of parameter spaces (Simpson *et al.* 2008; Koziel & Yang 2011). Here, DOE is employed to ensure that data-driven surrogates fully explore the parameter space and provide a robust representation of the full-physics simulation model. DOE aims to maximize the amount of information acquired from a minimum number of simulation runs by optimally allocating samples in the design space (Simpson *et al.* 2008; Myers *et al.* 2009; Koziel & Yang 2011). DOE employs different sampling methods to identify a subset of experiments from a larger set according to the number of experimental parameters under investigation.

Deterministic experimental designs, such as Box–Behnken, fractional factorial and central composite designs, are perfectly orthogonal, explore a large region of the search space and are able to capture model non-linearities (Box *et al.* 1978; Myers *et al.* 2009). To select input parameters from random distributions, stochastic samplers such as Latin Hypercube (Helton & Davis 2003) or nearly orthogonal arrays (Giunta *et al.* 2003) are frequently used. Stochastic samplers are also called space-filling designs because they are not restricted to sample sizes that are specific multiples of design parameters.

Here, we use the Box–Behnken experimental design to generate surrogate training simulations. Box–Behnken is a quadratic experimental design that assures global coverage of the parameter space at an acceptable computation cost and takes the interaction of input parameters into account. For validation of the surrogates, we generate surrogate-testing simulations using the stochastic Latin Hypercube experimental design, which can select input parameters from random distributions and explore the parameter space in a non-rigid way.

### Polynomial chaos expansion

Experimental design techniques coupled with data-driven surrogates have been widely used in hydrocarbon recovery (e.g. Friedmann *et al.* 2003; Cullick *et al.* 2006) and CO_{2} storage (e.g. Ashraf *et al.* 2013) applications for uncertainty quantification, risk assessment, optimization and history matching. One group of data-driven surrogate modelling techniques that has received increasing attention is polynomial chaos expansion (PCE) (Crestaux *et al.* 2009; Oladyshkin *et al.* 2011; Buzzard 2012; Zhang & Sahinidis 2012; Elsheikh *et al.* 2014). PCE methods build a polynomial approximation of the model response using an orthogonal polynomial basis. PCE techniques are efficient and provide a high-order accurate way of including non-linear effects in stochastic analysis (Oladyshkin *et al.* 2011, 2012).

PCE techniques are mainly classified into intrusive and non-intrusive approaches. Intrusive approaches such as the stochastic Galerkin methods (Villadsen & Michelson 1978; Ghanem & Spanos 1993; Xiu & Karniadakis 2003) require manipulation of the underlying partial differential equations that are solved within the reservoir simulator. Non-intrusive approaches do not require manipulation of the governing equations and use the reservoir simulator as a black box. Hence, they are more straightforward to apply and involve the evaluation of the coefficients in the chaos expansion using a given number of model simulations (Isukapalli *et al.* 1998; Blatman & Sudret 2010; Oladyshkin *et al.* 2011).

In this study, we focus on non-intrusive sparse polynomial chaos expansion (sPCE) and arbitrary polynomial chaos expansion (aPCE) in comparison to polynomial regression (PR). Polynomial regression estimates the coefficients for a second-order polynomial by least-squares fitting of the data-driven surrogate model to the training data (Myers *et al.* 2009). Sparse polynomial chaos (sPCE) is an extension of the generalized polynomial chaos, which is based on the Askey Scheme (Askey & Wilson 1985) of orthogonal polynomials (Xiu & Karniadakis 2003; Blatman & Sudret 2010; Elsheikh *et al.* 2014). Arbitrary polynomial chaos (aPCE) techniques have been shown to minimize the subjectivity of input data distributions by directly using the available information in a data-driven formulation of PCE and employing a global polynomial basis for arbitrary distributions of data (Oladyshkin *et al.* 2011; Ashraf *et al.* 2013).

### Optimization

In the presence of multiple uncertainties, finding the most favourable combination of uncertain input parameters to obtain an optimum value of the objective function (e.g. oil recovery, gas utilization factor) is challenging and commonly requires the application of stochastic optimization algorithms. Stochastic algorithms, including simulated annealing (Dowsland & Thompson 2012), particle-swarm optimization (Esmin *et al.* 2015), neighbourhood algorithm (Subbey *et al.* 2003), differential evolution (Eiben & Smith 2003) and genetic algorithm (McCall 2005), have been applied to many reservoir engineering problems. Stochastic algorithms incorporate a random component that allows the search during optimization to move towards worse solutions occasionally, thereby gaining the ability to seek out the global optimum objective function while escaping from local minima.

We use the genetic algorithm, a heuristic search and optimization technique based on natural evolution through selection (Bäck *et al.* 2000; Eiben & Smith 2003; McCall 2005). The algorithm uses selection, crossover, mutation and recombination of individual reservoir models to obtain a new generation of potentially superior individuals based on ranking with a fitness function (i.e. objective function: see the subsection on ‘Surrogate modelling and validation’ later in this paper). The procedure is repeated to obtain multiple generations until an optimum value of the objective function is reached. The genetic algorithm is robust, flexible and easy to adapt to different engineering problems because it uses the objective function value to determine new search steps and does not require gradient information from the optimization problem. Hence, the genetic algorithm can be applied to optimization problems for which traditional algorithms fail because of significant non-linearities or discontinuities in the search space. Several studies provide more details about the genetic algorithm and its application (e.g. Mitchell 1999; Bäck *et al.* 2000; McCall 2005; Costa *et al.* 2014).

### Objective and workflow

The aim of this study is to generate, analyse and compare non-intrusive data-driven surrogate modelling techniques, and to illustrate their application to the simulation and optimization of CO_{2} WAG injection in fractured carbonate reservoirs where multiple geological (e.g. fracture properties), physical (e.g. trapping of the gas phase) and engineering (e.g. well controls) uncertainties are encountered. We seek to show the benefit of surrogate models for faster sensitivity analysis and optimization of complex EOR methods in fractured reservoirs by overcoming challenges associated with the high computational cost of conventional simulation. Box–Behnken experimental design is used to set up a wide range of simulations of the high-resolution carbonate reservoir model. Subsequently, the simulations are used to build data-driven surrogates. For validation, additional simulations with random design parameters are set up using the Latin Hypercube experimental design and compared to the response of the data-driven surrogates for the same input parameters. The most accurate surrogate model after validation is then coupled with Monte Carlo methods to generate cumulative distribution functions of oil recovery and gas utilization. Subsequently, the selected surrogate model is employed for optimization of the objective function using a genetic algorithm.

A summary of the workflow we have used to construct data-driven surrogates for fractured carbonate reservoirs is presented in Figure 1. Input data from multiple sources such as seismic surveys, wireline logs, borehole imaging, petrophysics, core analysis, and surface and subsurface analogues are used to build a detailed geological model that is then upscaled to a full-physics finite-difference simulation model. Full-physics simulation using the minimum and maximum values of uncertain parameters is used to identify and rank input variables with significant impact (i.e. heavy hitters) on the objective function(s). The heavy hitters are then coupled with DOE techniques to generate surrogate models, which are validated before they are employed for rapid simulation, optimization and uncertainty quantification.

This paper is organized as follows. The section on ‘Reservoir model description’ describes the reservoir model, matrix properties, fracture characteristics and fluid properties employed in the full-physics flow simulations used to train and test the surrogate models. The section on ‘Set-up of the data-driven surrogate models’ discusses how these model are organized, including the screening of parameters, the experimental design, the methodology of the surrogate modelling, the validation approach and the optimization algorithm. The ‘Results’ section demonstrates the prediction of the objective function(s) with adequately trained surrogate models before describing how the goodness-of-fit measures can be used to validate the surrogates. Subsequently, the surrogates are employed for rapid uncertainty quantification and optimization. Finally, a discussion of the results and the conclusions are presented.

## Reservoir model description

### Matrix characterization and fluid properties

In this study, we use a high-resolution flow-simulation model of the Amellago Island outcrop, a middle Jurassic carbonate ramp in the High Atlas Mountains of Morocco (Amour *et al.* 2013; Agada *et al.* 2014). The outcrop can be considered as an analogue for the highly productive carbonate reservoirs of the Arab D Formation in Qatar. Data from real subsurface reservoirs were used to model porosity and permeability for the facies in the outcrop to ensure a realistic distribution of the reservoir properties, while the architectural elements of the model were obtained from the outcrop analogue. Many heterogeneous lithologies were preserved in the simulation model, including mollusc banks, mud mounds, patch reefs, and subseismic faults and fractures. A detailed description of the outcrop geology and static modeling, as well as the fracture network modelling workflow, is presented in Agada *et al.* (2014).

Owing to the large number of simulations required to generate different surrogates, a sector of the Amellago outcrop model consisting of 34 × 35 × 36 grid cells (42 840 cells in total) was used to study CO_{2} WAG injection in the heterogeneous reservoir (Fig. 2). Each grid cell had dimensions of 15 × 15 × 3 m. An inverted five-spot well pattern was used with a vertical injection well at the centre of the model and four vertical production wells at the corners. CO_{2} WAG injection was simulated using a WAG ratio of 1:1 and eight alternate 6 month cycles. The injectors and the producers were set to operate at target liquid rates subject to maximum bottom-hole pressure (BHP) constraints of 41 368 kPa and minimum BHP constraints of 16 547 kPa, respectively. The reservoir was assumed to have an initial reservoir pressure of 20 684 kPa and a bubble-point pressure of 11 367 kPa. Reference densities for CO_{2}, oil and water were assumed to be 1.35, 800 and 1000 kg m^{−3}, respectively (Table 1).

To account for rock–fluid interactions during full-physics flow simulations, two-phase relative permeability and capillary pressure curves (i.e. saturation functions) are typically utilized. Here, we use saturation functions similar to those generated by Agada *et al.* (2016) for end-member wettability scenarios (i.e. water-wet to oil-wet) for carbonate reservoirs. The two-phase saturation functions were generated with Corey relationships, which for oil–water and gas–oil systems can be described as
(1)
(2)
(3)
(4)
(5)
(6)where *k*_{r} , *S* and *n* denote the relative permeability, fluid saturation and Corey exponent, respectively. Subscripts, w, o and g represent water, oil and gas, respectively, while subscripts i and r denote the initial and residual saturations. is the pore-size distribution index.

Three-phase saturation functions, which are important to account for multiphase flow interactions in the three-phase flow regions generated during WAG injection, were computed using the Stone II model (Stone 1973), while hysteresis in the relative permeabilities during alternate drainage and imbibition cycles was modelled using the Killough (1976) hysteresis model. For fluid-displacement processes where the capillary pressure drop is much less than the drop in viscous pressure at the scale of the grid resolution (such as in this study), capillary pressure hysteresis effects are negligible. We tested simulations with and without capillary pressure hysteresis, and the results were virtually identical. It should be noted that this assumption is only valid to the extent that the matrix permeability and/or fracture intensity are high enough for viscous pressure to adequately describe the fluid displacement: that is, the matrix saturation does not significantly lag behind the displacement front in the fractures.

### Fracture characterization and discrete fracture network

The unique flow behaviour of fractured carbonate reservoirs is due to the interaction between high-permeability low pore volume fractures and the low-permeability high pore volume matrix. Characterization of the fracture system is therefore critical to ensure accurate reservoir simulations of fractured carbonate reservoirs that form the basis for accurate surrogates. During the investigation of outcrop analogues, fracture characterization involves evaluating data from detailed geological observations in the context of well-established conceptual models for the evolution of the fracture network. Conceptual models for the fracture system include, but are not limited to, pervasive background (or regional) fracture systems, fault-related fracture systems leading to zones of high fracture intensity and bedding-related fracture systems (Bonnet *et al.* 2001; Chesnaux *et al.* 2009; Mäkel 2007; Agada *et al.* 2016).

Here, we assume that the fractures are part of a pervasive background fracture system with volumetric fracture intensities (P32) that vary from 0.05 to 0.2 m^{2} m^{−3}. We selected the range of fracture intensities because the effect of the fracture intensity drastically reduces below 0.05 m^{2} m^{−3} and above 0.2 m^{2} m^{−3}. Our workflow focuses on only including parameter end members that have a significant impact on simulation results (i.e. heavy hitters). Furthermore, we have addressed the impact of the range of fracture intensity on oil production in a previous paper (Agada *et al.* 2016). There we observed ‘tipping points’: that is, we observed that at moderate fracture intensity, simulation outcomes respond to variations in other model uncertainties. At high fracture intensity, however, the simulation response was the same regardless of the variation in other parameters because the fractures form connected high-permeability channels that become the main control on fluid flow in the reservoir.

The fractures are modelled using a discrete fracture network (DFN) approach, which is thought to capture the connectivity and scale-dependent heterogeneity of fracture systems (Dershowitz *et al.* 2000; Mäkel 2007). The fracture data are obtained from detailed observations of the Amellago outcrop during extensive field mapping using high-resolution photopanels and LiDAR (Light Detection And Radar). Three intersecting fracture sets are evaluated (Fig. 3). On average, the dip azimuth for each fracture set varies between 95°, 135° and 165°, while the dip angle varies between 74°, 75° and 76° (Fig. 4). The mean fracture length is 20 m, while the variation of the fracture length with respect to the mean is defined using an exponential distribution. Fracture apertures with a mean of 0.5 mm are used to estimate fracture permeabilities with the cubic law. Fractures are assumed to be open in all scenarios. Vertical injection and production wells intersect fractures in all cases.

Fracture network flow parameters, including equivalent permeability tensors and shape factors, were obtained by upscaling the fracture networks to the grid cells of the simulation model (Fig. 5). We have chosen to use the modified Oda (1985) DFN upscaling method, which is more computationally efficient than flow-based DFN upscaling and is accurate for fracture systems with good connectivity. A dual-porosity dual-permeability formulation (e.g. Kazemi *et al.* 1992) was used to couple fracture–matrix fluid flow due to the significant heterogeneity and hydraulic continuity in the matrix. The exchange of fluids between the fractures and the matrix was modelled using the Gilman & Kazemi (1983) transfer function.

## Set-up of data-driven surrogate models

Data-driven surrogates were generated for two objective functions: the oil recovery factor and the net gas utilization factor (GUF). The oil recovery factor indicates the fraction of oil that is recovered from the reservoir, while the GUF indicates the net amount of gas that is injected into the reservoir per barrel of oil produced from the reservoir. In general, it is economically desirable to maximize oil recovery and minimize GUF.

The equations used to generate data-driven surrogates with polynomial regression and polynomial chaos expansions are presented below. We assume that second-order polynomials are sufficient to capture the non-linear interactions of the uncertain input parameters in this study. Higher-order polynomials can be employed to incorporate more non-linearity at greater computational expense. The general equation for second-order polynomial regression is given by: (7)whereis the objective function, are the uncertain parameters, is the intercept, are the coefficients of the linear terms, are the coefficients of the quadratic terms and are the coefficients of interaction terms.

The polynomial chaos expansion for a model output is given by:
(8)where the coefficient *c _{i}* represents the dependence of the model output on the input parameters

*x*. The function is a simplified form of the multivariate orthogonal polynomial basis for

*x*. The number of

*M*terms in the expansion depends on the total number of input parameters

*N*and the order

*d*of the expansion, according to equation (9) (Hosder 2012): (9)Subsequently, the unknown coefficients in the expansion (equation 2) are evaluated using a non-intrusive least-square collocation method (Chen

*et al.*2009). For arbitrary polynomial chaos expansion, the data-driven polynomial basis for one random variable of degree

*k*is given by: (10)Here are the coefficients in . The coefficients are constructed in such a way that the polynomials in equation (10) form a basis that is orthogonal in arbitrarily given distributions of data (Oladyshkin

*et al.*2011). A detailed description of the polynomial basis functions used in sparse polynomial chaos expansion is presented in Elsheikh

*et al.*(2014).

### Parameter screening

Parameter screening is usually the first step in the process of generating surrogate models. Here, full-physics simulation using the minimum and maximum values of uncertain parameters is employed to identify and rank input variables with significant impact (i.e. heavy hitters) on the objective function(s). The heavy hitters are then coupled with experimental design techniques to generate surrogate models. Sensitivity analysis carried out by varying one parameter at a time is a simple and well-known procedure for parameter screening. The screening results indicate that the most important uncertainties affecting CO_{2} WAG injection in this reservoir include the fracture permeability, the matrix wettability (*KR*), the fault transmissibility (*FT*) and the trapped gas saturation (*S*_{gt}) (Fig. 6).

The screening study shows that as uncertain parameters vary between their minimum and maximum values, increasing the fracture permeability typically results in up to a 16% decrease in the oil recovered and the GUF. Conversely, increasing the maximum trapped gas saturation, wettability or fault transmissibility increases the oil recovery (and GUF) by 15%. Only uncertainties that show a significant impact on the simulation model response, as indicated in Figure 6, are considered in the subsequent experimental design and surrogate model set-up.

### Experimental design

A Box–Behnken design (Box *et al.* 1978) was used to vary the uncertain parameters (Table 2). Identical well configurations, flow rates and pressure constraints were maintained to ensure that the variability in simulation outcomes was due to the main uncertain parameters.

Fracture permeability multipliers were varied between 0.1 and 10 to account for end-member fracture permeability scenarios. The fault transmissibility was varied between low-transmissibility scenarios, where the faults were completely sealing (*FT* = 0), and high-transmissibility scenarios, where the faults were fully conductive (*FT* = 1). Relative permeability and capillary pressure curves varied from oil-wet to water-wet, corresponding to the low and high end members, respectively. The trapped gas saturation varied from zero (no hysteresis) to a maximum trapped gas saturation of 0.4.

### Surrogate modelling and validation

Full-physics reservoir simulations were carried out employing the Box–Behnken experimental design using a training dataset of 312 samples. The simulation input variables and the corresponding outputs were used to train polynomial regression, sparse polynomial chaos (sPCE) and arbitrary polynomial chaos (aPCE) algorithms to generate approximations of the simulator output. To test the prediction accuracy of the surrogate models, we evaluated validation simulations using 105 Latin Hypercube samples and compared the response of the data-driven surrogates to the numerical simulation output. We used the coefficient of determination (*R*^{2}), adjusted coefficient of determination () and root mean square error (RMSE) as goodness-of-fit measures. *R*^{2} indicates how well the data-driven surrogates predict full-physics simulation results. is a modified form of the coefficient of determination that accounts for the number of regression coefficients in the surrogate equation. RMSE is the root mean square error of the data-driven surrogate response compared to the full-physics simulation. In general, higher values of *R*^{2}, higher values of and lower values of RMSE indicate higher surrogate accuracy. Mathematically, *R*^{2}, and RMSE are given by:
(11)
(12)
(13)where *y* denotes the full-physics simulation result (i.e. oil recovery factor or GUF) used to train the surrogates, is the mean value of *N* full-physics simulation results evaluated at the end of production, *f* represents the surrogate predictions corresponding to *N* simulation cases and *K* denotes the number of regression parameters utilized in the surrogate model. By incorporating the number of regression parameters, provides a conservative estimate of the surrogate accuracy.

### Optimization with a genetic algorithm

The surrogate models were coupled with the genetic algorithm to optimize the oil recovery and GUF based on a modelling framework in which multiple realizations of the geological model are considered while varying operational (i.e. engineering) parameters, such as well locations and flow rates, to optimize the oil recovery and GUF. Here, we assume that multiple realizations of the geological model are obtained when different combinations of the DFN model, saturation functions, residually trapped fractions and fault transmissibility interact with the matrix, based on the experimental design. Therefore, each combination represents a unique fracture–matrix geological model scenario. Subsequently, the operational parameters of the central injector in the five-spot well pattern are varied to optimize the oil recovery and GUF across the full range of fracture–matrix geological scenarios. During the optimization process, the location of the central injector is varied within an area of 120 m^{2}, while injection rates are varied up to a maximum of 1987 m^{3}/day, set to ensure that the well bottom-hole pressures generated during injection are below the formation fracture pressure at all times.

The genetic algorithm optimizes an objective function by a process of selection, mutation and recombination, as shown in Algorithm 1 (Koziel & Yang 2011). We used a population size of 50 and a cross-over probability of 0.8 to ensure that the algorithm captured a large search space and to avoid being trapped in local minima. Larger population sizes had no effect on the optimization results. The algorithm was evaluated for 50 generations (i.e. iterations) to obtain optimum results based on a function tolerance of 10^{−6}. The function tolerance defines the minimum difference between new and existing optimal values so that the optimization iteration is terminated when a predefined function tolerance is reached.

1 | Start |

2 | Initialize solutions x of population _{i}λ |

3 | Evaluate objective function for the solutions x in _{i}λ |

4 | Repeat |

5 | For i = 0 to β |

6 | Select ρ parents from λ |

7 | Create new x by recombination_{i} |

8 | Mutate x_{i} |

9 | Evaluate objective function for x_{i} |

10 | Add x to _{i}λ′ |

11 | Next |

12 | Select µ parents from λ′ and form new λ |

13 | Until termination condition |

14 | End |

## Results

### Surrogate training with full-physics simulations

We used black oil simulations in IMEX^{TM} as a basis for generating the data-driven surrogates. The full-physics flow simulations indicate channelling during hydrocarbon displacement in the reservoir, which makes CO_{2} WAG injection a desirable recovery option because WAG injection can ensure better mobility control and frontal stability to improve the contact of injected fluids with unswept zones (Fig. 7a). Buoyant CO_{2} migration to the top of the reservoir owing to gas–oil density differences is also apparent (Fig. 7b). Furthermore, the full-physics simulations provide the relevant training and testing datasets for generating the proxy models. On average, the computational cost for each black oil simulation run is 8.2 h when the simulation is truncated after 1500 days. Considering that simulations were evaluated for 312 Box–Behnken samples and 105 Latin Hypercube samples, truncating each simulation after 1500 days seemed to be the most feasible way to complete the entire study within a reasonable time frame.

The oil recovery and GUF profiles for the training simulations (Fig. 8a, b) show a range of simulation responses based on various combinations of uncertain input parameters. As expected, the oil recovery increases as alternate cycles of water and gas are injected into the reservoir. The GUF, however, increases initially but begins to decrease as the reservoir becomes gas saturated.

### Oil recovery surrogate prediction

The response surfaces that can be generated from training simulations using the three data-driven surrogate models (PR, sPCE and aPCE) are very similar and the relative error between response surfaces is approximately 0.002. For analysis, we focused on second-order aPCE response surfaces (Fig. 9). We observed from the four response surfaces that the horizontal fracture permeability always has the highest impact on the simulated oil recovery. This clear link between an increase in the fracture connectivity and a decrease in the oil recovery is to be expected because an increased connectivity across the fracture network results in a reduction in the residence time of injected fluids and, subsequently, a reduction in the effectiveness of oil recovery from the matrix due to gravity drainage and capillary imbibition.

Consequently, the highest overall oil recovery is observed when the fracture permeability is low and the matrix is water-wet and, hence, imbibition is most effective (Fig. 9c). The lowest overall recovery is observed when both the vertical and horizontal fracture permeabilities are at their highest values (Fig. 9d), indicating that, when the fractures are well connected, fracture networks form fluid-flow highways that lead to the rapid transport of injected fluids and thereby resulting in low oil recovery. Increased fault transmissibility (Fig. 9a) allows the injected fluids to access all parts of the reservoir more readily, which improves recovery. Similarly, an increase in the maximum trapped gas saturation reduces the overall gas mobility and leads to improved recovery predictions (Fig. 9b). This is because a reduction in the gas mobility increases the stability of the gas–water mobility front, delays gas breakthrough and improves the contact of gas with residual oil, thereby ensuring better microscopic and macroscopic sweep of the reservoir.

On average, the computational cost for each surrogate model evaluation is 13.2 s, indicating a significant reduction in CPU time when compared with the 8.2 h CPU time required for a single full-physics simulation. However, consideration must be given to the overhead associated with creating the surrogates. The overhead for creating the surrogates is directly proportional to the number of training and testing simulations that are required to generate robust surrogates. Once the simulations are run, computer codes in MATLAB are applied to the data to generate surrogates within seconds. It is difficult to quantify the time required to write MATLAB codes or analyse the results at each level of modelling complexity as these depend on the experience or expertise of the modeller. For a modeller who fully understands the workflow, a minimum of 7 days’ simulation using a high-performance computer cluster with 20 processors would be required to generate training/testing simulations and the surrogate models in this study.

### Gas utilization factor surrogate prediction

The net gas utilization factor (GUF) generally increases with increasing horizontal fracture permeability (Fig. 9). This increase is caused by high-permeability fracture networks that allow more gas flow per barrel of oil recovered from the matrix due to the rapid fluid transport in the fractures. We notice that the fault transmissibility has a limited effect on the GUF (Fig. 10a). This is because the fault transmissibility impacts oil and gas migration in the reservoir in the same way: when the fault transmissibility is low, the flow of gas and oil across the faults is limited; when the fault transmissibility is high, the flow of gas and oil across the faults is enhanced.

The GUF increases with higher values of gas trapping due to hysteresis (Fig. 10b). It is well known that relative permeabilities depend on the saturation path during hydrocarbon displacement cycles (e.g. Larsen & Skauge 1998). The cycle dependence influences the amount of gas trapped in the subsurface, thereby resulting in higher GUFs as the trapped gas fraction increases. Conversely, the GUF decreases with increasing water-wetness (Fig. 10c). Although the amount of trapped non-wetting gas is higher in a water-wet scenario, the oil recovery is also very high (Fig. 9c). Hence, the GUF, which is a ratio of net gas utilized to oil produced, decreases with increasing water-wetness. The GUF is highest (Fig. 10d) when the vertical and horizontal fracture permeabilities are high, which indicates rapid gas transport and accumulation at the top of the reservoir when the fracture permeability is very high.

### Surrogate validation: goodness-of-fit measures

To validate the surrogate models that were obtained from the training simulation, we compare the predictions of the surrogates with results from full-physics simulations and generate the relevant cross-plots to estimate goodness-of-fit measures. The coefficient of determination (*R*^{2}) for oil recovery obtained from polynomial regression, sparse polynomial chaos (sPCE) and arbitrary polynomial chaos (aPCE) is 0.9635, 0.9768 and 0.9770, respectively (Fig. 11; Table 3). The *R*^{2} value indicates that all of the data-driven surrogates are valid and that the PCE models yield a slightly better approximation of the actual simulation model. The goodness-of-fit measures for the GUF also show that the PCE models give consistently better predictions of the actual simulation results (Fig. 11; Table 3). A comparison of the PCE models for both oil recovery and GUF indicates that the aPCE models give marginally better results compared to the sPCE models. However, it is expected that further tuning of the sPCE model may allow us to eradicate the difference between the aPCE and sPCE model. Subsequent relative error analysis, Monte Carlo simulations and model optimization focus on proxy models from aPCE.

### Surrogate validation: relative error

Relative error response surfaces (Figs 12 and 13) show the discrepancy between the response surfaces from PR and aPCE. In comparison to aPCE, PR always over-predicts the oil recovery (Fig. 12) and under-predicts the GUF (Fig. 13). Analysis of the relative error between the aPCE and PR response surfaces shows that, although the overall error is minimal, the difference in the prediction is most evident in the middle of the design space. This is because the deterministic Box–Behnken experimental design used in setting up the training simulations generates samples that more adequately capture the actual model behaviour at the boundaries of the design space but have greater uncertainty at the middle of the design space.

To further investigate the deterministic sampling bias, we generated test simulations using the more random Latin Hypercube experimental design (Fig. 14). We observe that when random samples are added to the design, the mismatch between PR and aPCE prediction has a wider spread in the design space. However, the absolute error from such a random design is greater than the error from the deterministic design.

The final choice of which design method to employ should be a function of how well the surrogate predicts the behaviour of the actual simulation in any given scenario. Furthermore, combining different experimental design techniques, as we have done in this study, could also be a reliable way to account for uncertainties that may propagate from the experimental design techniques used to generate the data-driven surrogates.

### Surrogate-based uncertainty quantification and probabilistic assessment

Monte Carlo simulations carried out using the aPCE surrogate and evaluated 65 000 times were used to determine the cumulative distribution functions for oil recovery and gas utilization factor over the range of uncertainty for the input parameters (Fig. 15). The 10th, 50th and 90th (P10, P50 and P90) percentile probabilistic estimates for oil recovery are 0.31, 0.34 and 0.37, respectively, for simulation of immiscible CO_{2} WAG injection. In addition, the P10, P50 and P90 probabilistic estimates are 0.45, 0.53 and 0.60 for GUF.

### Surrogate-based optimization

The aPCE surrogate model coupled with the genetic algorithm was employed to optimize the oil recovery and GUF. Optimization using the genetic algorithm progresses as a minimization of the fitness value (i.e. −1 × objective function), with the mean fitness value improving during each generation until the optimum is reached after 50 generations as determined by the predefined function tolerance (Fig. 16).

As discussed in the earlier subsection on ‘Optimization with a genetic algorithm’, the aPCE surrogate is coupled with the genetic algorithm to optimize the oil recovery and GUF based on a framework where multiple realizations of the geological model are considered while varying operational parameters, such as well locations and flow rates (Table 4). It is assumed that multiple realizations of the geological model are obtained when different combinations of the DFN model, wettability scenario, residually trapped fraction and fault transmissibility interact with the matrix, based on an experimental design with each combination representing a unique fracture–matrix scenario. Here, the operational parameters of the central injector in the five-spot well pattern (Fig. 7) are varied to optimize the oil recovery and GUF across the full range of fracture–matrix geological scenarios. Figure 17 illustrates convergence of the oil recovery (and GUF) to the optimum after 2000 evaluations of the surrogate model based on the genetic algorithm.

When the surrogate-based optimization results are compared to evaluations of the full-physics model using the optimum input parameters, an absolute error of 0.0048 and 0.0043 is obtained for the oil recovery and GUF, respectively. We observe a few random sub-optimal solutions as the algorithm evolves and converges to the optimum due to the random component in the genetic algorithm that allows the search during optimization to move towards sub-optimal solutions occasionally in order to seek out the global optimum objective (Fig. 17). These random solutions increase our confidence that the algorithm adequately explores the parameter space and obtains a global optimum.

In this study, it was sufficient to optimize a single objective (e.g. oil recovery). Since the oil recovered is inversely proportional to the GUF, maximizing the oil recovery concurrently minimizes the GUF, which are both desirable outcomes. To study the possibility of optimizing many competing objectives, however, multi-objective optimization is required. Multi-objective optimization finds a set of optimal solutions in the range between two (or more) optima. The set of optimal solutions, known as the Pareto front, should ideally have a good spread (Deb 2014). The surrogates generated in this study can be utilized for multi-objective optimization at no additional cost (i.e. no additional simulation runs).

## Discussion

Reservoir simulation and optimization of CO_{2} WAG injection in fractured carbonate reservoirs is a complex and time-consuming process. By applying surrogate models to approximate full-physics numerical simulations using a limited number of training and testing simulations that cover the parameter space and account for key uncertainties, we can significantly reduce the overall modelling time. The surrogates can then help to understand the respective dependencies and correlations of uncertain input parameters, and contribute to rapid simulation and optimization under uncertainty.

Response surfaces generated using surrogate models show that fault transmissibility, fracture network properties, matrix wettability, residual trapping due to hysteresis and the fracture network properties are key uncertainties that significantly impact the prediction of oil recovery and gas utilization for fractured carbonate reservoirs. Furthermore, the interrelated effect of these uncertain parameters is often greater than the impact of one parameter on the model outcome. For example, the interrelated effect of high wettability and low fracture network permeability on oil recovery is higher than the end-member effect of either of these parameters on oil recovery. Such observations necessitate the application of experimental design techniques that improve evaluation of the parameter space and capture the interactions of major uncertainties. Here, Box–Behnken and Latin Hypercube experimental designs were used to generate a large number of training and testing samples (i.e. full-physics simulations), respectively.

The chosen experimental design is a source of uncertainty in the surrogate modelling workflow that may propagate to the surrogate model prediction because deterministic designs could be biased towards the boundaries of the design, while random designs may need more training and testing to constrain. By combining deterministic (Box–Behnken) and random (Latin Hypercube) experimental designs to account for the uncertainty from sampling bias, the workflow employed in this study improves the reliability of the surrogate model predictions.

Although, it is considerably faster to evaluate a data-driven surrogate than to run a full simulation case, it is self-evident that such a simple model must be constructed and used with care. The accuracy of the model should be thoroughly validated in order to estimate its prediction capability. Hence, the application of appropriate goodness-of-fit measures, such as the coefficient of determination (*R*^{2}) and the root mean square error (RMSE), is essential to ensure that the surrogate reliably replaces the full simulation model inside and outside of the design space. When the surrogates generated in this study are compared using *R*^{2} and RMSE, surrogate results from polynomial chaos expansion – both sparse and arbitrary PCE – consistently give better results than traditional polynomial regression.

The work presented in this paper provides a solid basis for diverse applications of PCE-based surrogates to several aspects of fractured reservoir simulation and optimization that would benefit from the computationally efficient workflow. First, the PCE-based surrogates can be applied to advanced global sensitivity analysis using Sobol indices (e.g. Buzzard 2012; Oladyshkin *et al.* 2012). As discussed in the earlier section on ‘Set-up of data-driven surrogate models’, the PCE-based surrogate output is presented as an orthogonal decomposition through the uncertain input parameters. The orthogonal decomposition can be employed directly through Sobol sensitivity indices to quantify the relative importance of uncertain input parameters on the final prediction. Once the PCE-based surrogate model is generated, the sensitivity indices can be constructed on-the-fly using analytical relationships, thereby providing information on the high-order interaction between contributing model parameters (e.g. Oladyshkin *et al.* 2012).

Second, robust optimization under geological uncertainty (e.g. Mulvey *et al.* 1995) can be achieved using the developed surrogates. During robust optimization, a given objective function is optimized by modifying engineering parameters (e.g. well location and flow rates) for a wide range of geological scenarios, thereby capturing geological uncertainty in the optimization process. Typically, robust optimization progresses by optimizing over the average and standard deviation of model results generated with different geological realizations. Because the average response surface obtained during robust optimization is much smoother than the response surfaces for individual realizations, it can potentially reduce the total number of simulations needed to build surrogates.

Third, multi-objective optimization can be carried out to optimize competing objectives (e.g. Deb 2014). For example, the oil recovery and net present value can be maximized while concurrently minimizing the GUF and water cut. When multi-objective optimization is employed in the framework of geological uncertainty, the objective function will need to reflect the impact of geological uncertainties by using either a mean value or the mean value combined with the standard deviation for each objective. Subsequently, an optimization algorithm (e.g. the classic genetic algorithm or the more recent Non-dominated Sorting Genetic Algorithm II) is run on the PCE-based surrogate to obtain a Pareto-optimal front representing competing objectives. The accuracy of the optimization outcome can be progressively improved by re-training the surrogates along the Pareto-optimal front and re-running the optimization algorithm.

This study seeks to demonstrate how surrogate models for fractured carbonate reservoirs can be coupled with a wide range of reservoir optimization techniques. Therefore, it should be noted that we do not focus on the details of specific optimization algorithms. We use the well-known genetic algorithm, but more advanced techniques that apply efficient gradient-based or stochastic techniques to field-scale reservoir optimization have been widely researched (e.g. Dowsland & Thompson 2012; Esmin *et al.* 2015).

## Conclusion

The purpose of this study was to generate, analyse and compare non-intrusive data-driven surrogate modelling techniques, and to illustrate their application to the simulation and optimization of CO_{2} WAG injection in fractured carbonate reservoirs. The synergistic application of experimental design, data-driven surrogates and genetic algorithms for CO_{2} WAG simulation and optimization represents a notable contribution of this work. We have shown that data-driven surrogates from PCE (arbitrary polynomial chaos expansion (aPCE) and sparse polynomial chaos expansion (sPCE)) show a higher degree of accuracy in predicting oil recovery and GUF compared to surrogates from polynomial regression. PCE techniques capture the synergistic effects between low- and high-order polynomial terms, and thereby provide higher accuracy. In particular, aPCE most closely approximates the actual simulations when trained and tested.

We demonstrate that data-driven surrogate models significantly reduce the computational cost by completing each model evaluation in 13.2 s compared to 8.2 h for full-physics simulation using the inputs. Hence, we are able to rapidly evaluate the dependency and correlation of uncertain input parameters as they influence the oil recovery and GUF. For example, we find that low fracture permeabilities, more water-wetting saturation functions, high residual trapping due to hysteresis and high fault transmissibilities are favourable to achieve higher oil recovery. When the computationally efficient surrogates are coupled with the genetic algorithm, over 2000 model evaluations are rapidly carried out to optimize the oil recovery and show the combination of input variables that are favourable to the optimum recovery scenario.

## Acknowledgements and Funding

The authors would like to thank the ExxonMobil (FC)^{2} Research Alliance for funding this project. Sebastian Geiger is grateful to Foundation CMG for supporting his Chair in Carbonate Reservoir Simulation. We acknowledge Computer Modelling Group, Schlumberger and Golder Associates for providing access to commercial software.

- © 2017 The Author(s)