Why Model Based Bioequivalence?

Traditional bioequivalence (BE) study design and statistical methods are well established (1,2) and are based on non compartmental analysis (NCA). There are, however, a number of clinical scenarios where a traditional BE study is impractical (3,4). Examples include:

There is value in developing method to meet the BE regulatory requirements for such drugs. Model Based Bio Equivalence (MBBE) is one method to achieve this.


The key parameter for implementing MBBE is the power of simulated studies. The power is key, rather than the outcome of any single simulated study, as the outcome of any single study will depend on the random seed chosen. A more general and robust answer regarding the likelihood BE can be obtained by simulating many studies, and examining the distribution of outcomes, and therefore, the predicted distribution of outcomes of actual studies, were such studies to be performed. The use of nonlinear mixed effect models and Monte Carlo simulation is well established for describing such distributions. Typically the data used for development of a population PK model do not come from a BE study. They may come from other, e.g., parallel design studies and/or sparse sampling. If data from a well designed BE study were available, Virtual BE studies would not be needed.

Model uncertainty and model averaging

“All models are wrong, some models are useful” [5]. We interpret this well known aphorism to mean that if a single model (structure and parameters) is used to describe the data, the prediction will be wrong (or at least not exactly correct). But, if “some” models (e.g., multiple models averaged together) are used, it may be useful, or perhaps at least less wrong. Aoki[8] showed that for simulated studies, model averaging in most cases increases the accuracy of predictions. In Monte Carlo simulation we typically include parameter uncertainty for a single model. With model averaging, we extend model uncertainty from a single model with uncertain parameters to uncertainty about the model structure as well. Rather than a single (incorrect) model, a number of “adequate” models are examined. Adequate models are models that meet some set of minimal requirements in describing the data. The method described by Aoki [8] is used for model averaging. A set of bootstrap samples from the original data set are generated. NONMEM is then used to estimated parameters for each adequate-model/bootstrap-data set combination is performed. For each sample, the model with the best (lowest) Bayesian Information Criteria (BIC [9]) is selected. The Monte Carlo simulation is then done with the model structure and parameter estimates selected for each bootstrap data set, thus capturing uncertainty in both the model structure and parameters.

The simulations can be performed using a data set different from the source data, as, typically, the reason for doing MBBE is that there are no formal BE studies available. In this way, non-BE study data can be used for model definition and then a formal BE study (e.g., 4 period, single dose cross over study) can be simulated. The NCA parameters for each (simulated) subject in each study are then calculated using and algorithm derived. The workflow is:

  1. Develop model(s) of the data. Model development and criteria for an “adequate” model are not discussed, but typically would include separate estimation by formulation for all absorption parameters (e.g, F, Ka, Lag time, zero order infusion duration, transit compartment rate constants etc.). Different structural models for the absorption of refernce vs test formulation may also be considered. An adequate model may also include between occasion variability on Volume and elimination terms, if multiple periods are available in the observed data.

  2. For each bootstrap sample, select the “best” model. In the current implementation, the best model is selected based on Bayesian information criteria (BIC). That is, parameters for each adequate model are estimated for each bootstrap sample. For each bootstrap sample, the model and associated parameter estimates with the best (lowest) BIC is selected. In this way, the uncertainty of both the model structure and the model parameters is described.

  3. From the selected models, perform Monte Carlo simulation, with a reasonable study design (e.g., 4 period, cross over, rich sampling, 50 subjects).

  4. Determine power of the analysis by calculating two one side t test statistics [6] on each simulated study.

Two options exist for how to combine those different models/parameter estimate. First, the data from each of the simulations can be used as simulated, with a single model per simulated study (the “model_averaging_by”: “study” option). This represents the possibility that any given model may be “correct” for all subjects, we’re just not certain which one it is. Alternatively, the study data can be recombined such that each study has a population of individuals based on the representation of overall models. (the “model_averaging_by”: “subject” option). For this option, any given study will be comprised of the same set of subjects (demographics, sample time etc.), but have different structure models. The interpretation here is that there is a distribution of models within the population (similar to the distribution of parameters). So, for the population we have a set prior probabilities for the structural model, but uncertainty about which model any given subject follows. This is analogous to a prior distribution (THETA and OMEGA in NONMEM) for parameters, but uncertainty about the “true value” parameters.

An overall diagram (from Aoki [8]) for model averaging is shown below:

From Aoki, Y., Röshammar, D., Hamrén, B. et al. Model selection and
averaging of nonlinear mixed-effect models for robust phase III dose
selection. J Pharmacokinet Pharmacodyn 44, 581--597 (2017)
From Aoki, Y., Röshammar, D., Hamrén, B. et al. Model selection and averaging of nonlinear mixed-effect models for robust phase III dose selection. J Pharmacokinetic Pharmacodynamic 44, 581–597 (2017)


Nyberg et. al [10] described using a SADDLE_RESET as a check for local non-identifiability. MBBE included an option

"use_check_identifiable": true,

to instruct the algorithm to determine whether any model resulting from the bootstrap is identifiable, based on the largest absolute fractional difference between pre and post saddle reset parameters of delta_parms. delta_parms can be set in the json argument file e.g:

"delta_parms": 0.2,


Traditional bioequivalence is assessed in a single study population (N=1, drawn from a universe of possible study populations). Thus there is some degree of “luck” associated with whether or not the formulae are deemed bioequivalent. MBBE permits a more robust approach, not dependent on a single draw from a distribution, that is, the power of the given study design to find that the formulae are bioequivalent. The success of each study is then calculated based on the method described by Schütz (11) in the replicateBE package in R. The resulting power then is simply the fraction of simulated studies that successfully demonstrate bioequivalence for each NCA endpoint.


MBBE is an R package available on CRAN. Details of implementation are provide in the Step-by-Step Vignette. Several files are required, include observed and simulation data sets, control file and an options file. Sample files with explanation is given below.

Input json File

{ "run_dir": "c:/fda/mbbe",

"model_source": [







"num_parallel": 32,

"crash_value": 999999,

"nmfe_path": "c:/nm74g64/util/nmfe74.bat",

"delta_parms": 0.2,

"use_check_identifiable": true,

"NCA_end_time": 72,

"rndseed": 1,

"simulation_data_path": "U:/mbbe/inst/examples/data_sim.csv",

"ngroups": 4, "samp_size": 100,

"reference_groups": [ 1, 2 ],

"test_groups": [3, 4 ],

"plan": "multisession",

"alpha_error": 0.05,

"NTID": false,

"model_averaging_by": "study",

"user_R\_code": true,

"R_code_path": "u:/mbbe/R/RPenaltyCode.r" }

Descriptions of these options are below:

The Analysis data set.

The bootstrap analysis data set can have any form required for the NONMEM model. However, several data items are required. These are:

The analysis data set need not have multiple periods or sequences (in which case all values can be 0, or missing). However the $INPUT record is copied from the analysis control file into the simulation control file, and the simulation control file MUST include GROUP, PERIOD, SEQ and TRT data items for statistical testing.

The Simulation data set

Typically, the simulation data set will be comprised of:

  1. Bioavailability-and-Bioequivalence-Studies-Submitted-in-NDAs-or-INDs—–General-Considerations .pdf

  2. Gabrielsson, J. and Weiner, D. (2001). Pharmacokinetic and pharmacodynamic data analysis: concepts and applications, volume 2. CRC Press

  3. Hu, C., Moore, K. H., Kim, Y. H., and Sale, M. E. (2004). Statistical issues in a modeling approach to assessing bioequivalence or pk similarity with presence of sparsely sampled subjects. Journal of pharmacokinetics and pharmacodynamics, 31(4):321–3

  4. Seng Yue C, Ozdin D, Selber-Hnatiw S, Ducharme MP. Opportunities and Challenges Related to the Implementation of Model-Based Bioequivalence Criteria. Clin Pharmacol Ther. 2019 Feb;105(2):350-362. doi: 10.1002/cpt.1270. Epub 2019 Jan 8.

  5. Box, George E. P. (1976), “Science and statistics” (PDF), Journal of the American Statistical Association71 (356): 791–799,

  6. Schuirmann, D.J. A comparison of the Two One-Sided Tests Procedure and the Power Approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics 15, 657–680 (1987).

  7. Lunn, D.J. Automated covariate selection and Bayesian model averaging in population PK/PD models. J Pharmacokinet Pharmacodyn 35, 85–100 (2008).

  8. Aoki, Y., Röshammar, D., Hamrén, B. et al. Model selection and averaging of nonlinear mixed-effect models for robust phase III dose selection. J Pharmacokinet Pharmacodyn 44, 581–597 (2017).

  9. Neath, A.A. and Cavanaugh, J.E. (2012), The Bayesian information criterion: background, derivation, and applications. WIREs Comp Stat, 4: 199-203.

  10. Nyberg HM, Hooker AC, Bauer RJ, Aoki Y. SADDLE_RESET: more robust parameter estimation with a check for local practical identifiability.