A mathematical model of cocoa bean fermentation

Cocoa bean fermentation relies on the sequential activation of several microbial populations, triggering a temporal pattern of biochemical transformations. Understanding this complex process is of tremendous importance as it is known to form the precursors of the resulting chocolate’s flavour and taste. At the same time, cocoa bean fermentation is one of the least controlled processes in the food industry. Here, a quantitative model of cocoa bean fermentation is constructed based on available microbiological and biochemical knowledge. The model is formulated as a system of coupled ordinary differential equations with two distinct types of state variables: (i) metabolite concentrations of glucose, fructose, ethanol, lactic acid and acetic acid and (ii) population sizes of yeast, lactic acid bacteria and acetic acid bacteria. We demonstrate that the model can quantitatively describe existing fermentation time series and that the estimated parameters, obtained by a Bayesian framework, can be used to extract and interpret differences in environmental conditions. The proposed model is a valuable tool towards a mechanistic understanding of this complex biochemical process, and can serve as a starting point for hypothesis testing of new systemic adjustments. In addition to providing the first quantitative mathematical model of cocoa bean fermentation, the purpose of our investigation is to show how differences in estimated parameter values for two experiments allow us to deduce differences in experimental conditions.


Introduction
The fermentation of cocoa beans is recognized as a key step in cocoa processing in terms of the development of chocolate's flavour and aroma [1,2]. It occurs mainly in the pulp, i.e. a white mucilagenous mass that surrounds the bean, where three major microbial groups drive mostly the whole process whose main activity occurs in a consecutively way ( subsequently surpassed by lactic acid bacteria (LAB), and after the decline of these two first groups, acetic acid bacteria (AAB) take over [4][5][6][7]. This so-called three-phase process, depending on the region and local farm practices, is expected to happen within a time frame of 2 -10 days [2,3,[8][9][10].
As a result of the fermentation, a series of biochemical reactions are triggered in the raw material, the qualitative characteristics of which have been exhaustively described in terms of the microbial groups involved and the associated metabolic alterations [9,11,12].
Despite the fact that this process is of high industrial relevance, there are hardly any attempts to construct a mathematical model of cocoa bean fermentation. So far, the existing modelling attempts are either focusing on specific post-fermentation steps such as drying kinetics [13,14], restricted to the sequential interaction of microbial communities using metabolic flux analysis [15,16] or kinetic approaches that 'cannot explain the dynamics in microbial population' [17].
The reasons for this are manifold, among them, the lacking of control over the fermentation process itself as well as the systemic complexity in terms of involved microbial communities. On the one hand, cocoa bean fermentation is conducted in a spontaneous way unlike the majority of other food fermentation processes [18] with a huge diversity of techniques and devices, e.g. heaps, boxes, baskets, trays, sacks and platforms [2,8,18]. Owing to the lack of control, it is difficult to identify the crucial parameters and key variables required for the formulation of an appropriate model.
On the other hand, in contrast to other relevant industrial fermentation processes such as those of beer and wine, the fermentation in cocoa involves microbial community dynamics of three major microbial groups, i.e. Y, LAB and AAB, which are, in turn, represented by several different strains [3,4,6,7]. Hence, the complexity is precisely one of the biggest challenges to overcome since growth modelling of microorganisms has been classically applied under much more controlled conditions that involve single strain cultures where mortality phenomena have been scarcely considered [19]. Consequently, this needs to be taken into account for an approximation of the cocoa bean fermentation process.
Cocoa bean fermentation is a prototypical situation for the application of modelling using coupled nonlinear ordinary differential equations (ODE): The initial situation displays a rich diversity with a multitude of influencing factors and the result of the dynamic process, the fermented cocoa bean, is of high relevance for the subsequent industrial processing steps and for the quality of the final product, chocolate.
Here, we present a one-compartment model for the cocoa bean fermentation process using the mathematical concepts of the Monod [20] and Contois [21] equations, assuming that single strain kinetic modelling techniques can describe the growth of mixtures of microbial species belonging to different microbial groups in the same environment. Moreover, microbial death processes are handled by the use of the Chick -Watson mortality law [22].
While conceptionally the modelling approach presented here is rather in the tradition of theoretical biology, the way to analyse and apply the model differs from e.g. a traditional linear stability analysis, as the purpose of the model is predominantly to describe transients in a batch culture [23], rather than asymptotic states as would be expected in continuous cultures.
With the model constructed along these lines, we were able to describe three datasets corresponding to two different cocoa-producing countries where two different fermentation methodologies were implemented. In that way; our model also can interpret differences in the experimental set-up of the two trials conducted under the same methodology, in terms of significant changes in the estimated parameters. This approach serves as a source of elucidation of possible hypotheses on how these parameters are affected by slight changes within a particular region where the fermentation took place.

Experimental data
The experimental data used in this study were reported in Camu et al. [4] and Papalexandratou et al. [24]. In both instances, the predominant cocoa hybrids harnessed by the chocolate industry, Criollo and Forastero, were used as the source of raw material. In the study of Camu et al. [4], the beans were fermented by the heaps method, while for Papalexandratou et al. [24], wooden boxes were used as fermenting devices. The data of Camu et al. [4] were collected in Ghana from seven trials in two field experiments and data of one representative trial were published. The data include measurements of microbial counts of Y, LAB, AAB and total aerobic bacteria. Metabolite time series measured both in pulp and bean are available for glucose (Glc), fructose (Fru), sucrose, lactic acid (LA), acetic acid (Ac), ethanol (EtOH), mannitol, citric acid and succinic acid.
The data reported by Papalexandratou et al. [24] were collected in Brazil from two trials in two field experiments, of which both trials were published as 'box 1' and 'box 2'. The conditions in which both trials were conducted differed slightly. On the one hand, the fermenting mass of box 1 was placed under a metal roof to protect it from weather conditions. On the other hand, the fermentation for box 2 was carried out in a fermentary room. The data include measurements of microbial counts of Y, LAB, AAB and total aerobic bacteria. Metabolite time series measured both in pulp and bean are available for Glc, Fru, sucrose, LA, Ac, EtOH, mannitol and gluconic acid.
In both collections, the fermentation trials took place in a time frame of 6 days with measurements performed at 17 time points for Camu et al. [4] and 14 time points for Papalexandratou et al. [24]. Abiotic factors, i.e. temperature and pH, were measured as well. Cell counts were done by means of malt extract agar, Man-Rogosa-Sharpe agar and deoxycholate-mannitol-sorbitol agar for Y, LAB and AAB, respectively, from both data sources. As metabolite time series, we considered Glc, Fru, EtOH, LA and Ac.

Microbial count units transformation
One of the most common forms of quantifying microbial growth is the count of colony forming units (CFU), specially when dealing with mixtures of microorganisms as the microbial successions are reported in the original works of Camu et al. [4] and Papalexandratou et al. [24]. In this sense, the vast majority of studies involving single-strained microbial growth are reported in terms of dry biomass as well as their dependent constants, i.e. maximum growth rates, mortality rates and yield coefficients.
In order to obtain comparable estimates to those available for species of these microbial groups in the literature, a conversion from CFU to dry biomass units was conducted based on available knowledge as well as geometric deductions for the microbial group involved in the process.
For species within the microbial group of Y, we used the conversion factor that one CFU of S. cerevisiae is equivalent to 15 picograms ( pg), as assumed by Schwabe & Bruggeman [25]. For LAB and AAB, since such conversion factors have not been reported yet, values were inferred by taking into account a geometric approximation based on the usual dimensions of the cells belonging to the genus of Acetobacter and to the species of Lactobacillus plantarum, respectively, according to the Bergey's Manual of Systematic Bacteriology [26,27] and assuming their shape given by a spherocylinder. Thus, using as reference a density value derived from the dry weight of a cell of E. coli of 0.28 pg [28] per micro cubic metre (mm 3 ) [29], the conversion factor between CFU to dry biomass of LAB and AAB were determined as 1.25 and 0.28 pg CFU 21 , respectively (see electronic supplementary material, S1).

Model development
The fermentation of cocoa beans has been described in detail with respect to its microbial dynamics and metabolite kinetics in both pulp and bean [3,4,[6][7][8]. From such descriptions, the fermentation process in rsos.royalsocietypublishing.org R. Soc. open sci. 5: 180964 3 cocoa can be understood as an overlapping succession of microbial activities that mostly occur in the pulp, where three core processes are easily identifiable. These are the conversion of Glc and Fru into EtOH by Y, Glc into LA by LAB and EtOH into Ac by AAB ( figure 1). Further processes such as the conversions of Glc into Ac by LAB and LA into Ac by AAB, have also been described [30]. The interpretation of these processes in a network diagram covering the pulp only, is shown in figure 2, where microbial growth rate is taken into account represented as the uptake of the respective substrates as well as the mortality rates for Y, LAB and AAB.
In our model, we consider a simultaneous growth of the three major microbial groups. The sequential dominance in the process is then emerging from the availability pattern of their respective substrates without taking into account abiotic factors.

Mathematical representation
The process of cocoa bean fermentation can be seen as a batch process that can involve the usual phases of microbial growth, i.e. lag, exponential, stationary and death phase. However, it is known that microbial growth occurring in natural environments might show different patterns [31]. In this sense, the two collections of experimental data on microbial successions expressed as the log CFU showed basically phases that resembled the exponential and death phases without noticeable stationary or lag phases. From the mathematical perspective, such phenomena can be expressed as an ODE for each of the state variables involved in such a way that the growth of microorganisms is dependent on the availability of their respective substrates, together with mortality equations to capture the inherent decay of the populations along time.
The two major effects on population size considered here, exponential growth and death phase, were modelled by different approaches. On the one hand, we use the classical Monod [20] and Contois [21] equations to describe the growth of groups of microorganisms belonging to a same microbial group (namely, Y, LAB and AAB), instead of the common use of these terms for single strain cultures. Accordingly, the growth rates of Y, v 1 on Glc and v 2 on Fru, of LAB, v 3 , and of AAB on EtOH, v 4 , (shown in figure 2) have the form of Monod equations, while the growth of AAB on LA, v 5 , is a Contois equation. The use of a Contois term for v 5 was considered under the assumption that, given that few species of AAB are capable of catabolizing lactic acid [16,32], the growth rate of these species is a function of their population size (see electronic supplementary material, S2). On the other hand, the mortality rates of all microbes, v 6 , v 7 and v 8 , are modelled as Chick-Watson equations [22] considering a nonlinear decay of microbial populations produced by second-and third-order reactions From the set of growth and mortality rate equations defined in table 1, a system of ODEs can be established in order to mathematically express the given network considering the necessary eleven yield coefficients to take into account the amounts of biomass that can be obtained from substrate as well as the amounts of produced metabolites, as shown in the system of ODEs in equations (2.1) to (2.8) that represent Glc, Fru, EtOH, LA, Ac, Y, LAB and AAB, respectively. A complete interpretation of the 24 estimated parameters is given in table 2.
The proposed model (as described in equations (2.1) -(2.8)) relies on three simple general assumptions: (i) relationships between Y and AAB, as well as of LAB and AAB, are of a pure commensalistic nature since there is no competition between them for any substrate, i.e. Glc and Fru, Table 1. Growth and mortality rate equations for the cocoa bean fermentation process. Microbial groups are represented as yeast (Y), lactic acid bacteria (LAB) and acetic acid bacteria (AAB). Metabolites are represented as glucose (Glc), fructose (Fru), ethanol (EtOH), lactic acid (LA) and acetic acid (Ac). Biomass and concentration of metabolites are both shown within square brackets [ ]. Maximum specific growth rates m max are shown of the form m i n max , where i can be either Y, LAB and AAB, and n refers to whether m corresponds to the maximum specific growth of Y on either Glc or Fru, or AAB on either EtOH or LA. Substrate saturation constants for the growth of Y, LAB and AAB are shown of the form K j m , where j can be either Y or LAB and m can be either Glc, Fru, EtOH and LA. Constant mortality rates are shown of the form k i , where i can be either Y, LAB or AAB. growth rate equation mortality rate equation and there is no direct effect upon the growth either of Y or LAB by the uptake of its main products, LA and EtOH, by AAB, respectively [33]; (ii) relationship between Y and LAB is a resource-type competition because both microbial groups share Glc as a main limiting substrate and they do not excrete metabolites affecting each other's growth [34,35]; and (iii) no impact of chemical and physical effects such as temperature and pH on the set of kinetic parameters.

Parameter estimation 2.4.1. Bayesian framework
In order to estimate the 24 parameters of the proposed model, a Bayesian framework was considered by fitting it to the experimental data. In this way, these parameters can be composed in a vector u of the form [u 1 , u 2 , . . . , u k ]. If it is assumed that u has given rise to the data D, the problem can be solved by inferring the posterior probability of u given D, P(u j D).
Assuming that there exist T time series in a model, with N independently measured data points at time j each, the posterior probability (P(u j D)) can be expressed as the product over all series and each data point within time series as: where D i,j is the data point from time series i measured at time j.  21 substrate saturation constant of AAB growth on EtOH 21 AAB-to-Ac from LA yield coefficient rsos.royalsocietypublishing.org R. Soc. open sci. 5: 180964 The deterministic model proposed in equations (2.1) -(2.8), can take the general form: where x i denotes each of the state variables and f (x i,j , u) the model prediction of the change of x i at time j as a function of the parameter vector u. Each experimental observation (D i,j ) can be considered as drawn from a normal distribution whose mean is equal to the model prediction f(x i,j , u), with a necessary standard deviation, s, that accounts for both experimental measurement error of the real observations and misprediction of the model. In this framework, each observation D i,j is drawn from a sampling distribution of the form thus allowing to reformulate the total posterior probability distribution in equation (2.9) as Using this form of the total posterior probability, in addition to the 24 parameters from equations (2.1) to (2.8), the total standard deviation, s, has also been estimated.

Variable scaling
For both collections of experimental data, the concentrations of microorganisms and metabolites differ by several orders of magnitude. As an example, after the transformation of CFU to biomass units in the experimental data of Camu et al. [4], the maximum concentration of AAB is approximately 0.0019 mg g( pulp) 21 , while the maximum concentration of the main substrate of AAB, EtOH, is approximately 22.4920 mg g( pulp) 21 . These different orders of magnitude between the state variables can lead to numerical issues during optimization. In order to reduce such issues, all state variables were scaled by dividing each of the time series in the experimental data by its own maximum value. Consequently, possible large differences between the parameters to be estimated are avoided and, most importantly, the search space can be constrained.
Hence, in a first step, the parameters were estimated using time series with a maximum value of 1 and, in a second step, re-scaled to their original physical units through conversion factors derived from equations (2.1) to (2.8) (see electronic supplementary material, S3).

Priors
By scaling the system to allow maximum values for each time series equal to unity, the large differences in orders of magnitude between the parameter estimates, e.g. maximum specific growth rates in the boundaries of fractions of milligrams with respect to yield coefficients that might take values of hundreds, are regularized; in this case, by introducing a scale that needs a prior distribution to be sampled within values between 0 and 1. In that way, an independent normal distribution with mean 0.5 and a standard deviation of 0.3 as prior choice for each k element of u represents a weakly informative prior by introducing scale information of the original units in which the parameters of the model are originally measured. For the standard deviation s, a Cauchy distribution C with location and scale parameters of 0 and 1, respectively, was used as prior distribution. This choice follows the same reasoning as depicted for the k independent priors for u, with the addition that the heavy tails of C allow for the sampling of extreme values which would account for outlying observations in the original data. To avoid the estimation of negative parameters, both priors are constrained to take values in the positive set of real numbers and are mathematically expressed as  for the eight state variables in equations (2.1) to (2.8) were provided as they were reported in the original works of Camu et al. [4] and Papalexandratou et al. [24]. Sampling for obtaining the posterior distributions of the unknown parameters as well as the model predictions was conducted using full Bayesian inference through the Markov chain Monte Carlo (MCMC) No-U-Turn sampler method [39]. The ODEs were specified and solved by the built-in mechanism of Stan rk45, which provides a fourthand fifth-order Runge -Kutta method for solving non-stiff systems [40,41]. All datasets were fitted by running four parallel Markov Chains of 3000 iterations each, 1000 of which were used as warm-up. Convergence of the sampling was determined by examining theR statistics computed by Stan.

Statistical analyses
Once the parameters of equations (2.1) -(2.8) were estimated, their posterior distributions obtained from fitting the model to each of the two trials, i.e. box 1 and box 2, reported by Papalexandratou et al. [24] were compared between each other. For doing so, an effect size statistic was used as proposed by Cohen [42], as a measure of the magnitude of either their relationship or difference. In that way, the effect size expressed as the standardized mean difference (d) of two independent continuous distributions was computed as where u k,1 and u k,2 are the sampled means corresponding to the parameter k obtained from box 1 and box 2, respectively, and S w k is the pooled within-groups standard deviation corresponding to the parameter k, which for the case of groups of equal sample sizes (in this study represented as the number of iterations of the MCMC sampler) is given by where S k,1 and S k,2 are the standard deviations of the posterior distribution of parameter k for box 1 and box 2, respectively. We used a threshold of jdj . 1.2, in order to identify significant differences between parameters [43].

Model's diagnostics
In all three datasets, the proposed model was fitted without major issues. The calculatedR statistic was 1 for all three cases (see electronic supplementary material, tables S4 -S6), showing that convergence of the MCMC sampler was accomplished. Such a behaviour is also noticeable in the obtained traceplots (see electronic supplementary material, figures S6 -S8), that show the typical 'caterpillar' shape as probe of a good mixing of the MCMC sampler along the exploration of the parameter space. Asymptotically, the ODE system converges to a stable fixed point (see electronic supplementary material, S5). In each of the data collections, despite the noisy nature of the experimental data and the low sampling rate, the corresponding simulations show the microbial succession previously reported by several studies for Y, LAB and AAB that emerges from the interplay of metabolites and microbial communities.

Metabolite and microbial population dynamics
Thus, most of the experimental observations reported by Camu et al. [4] and Papalexandratou et al. [24] fall within the computed 95% credible intervals of the simulations indicating that the model is capable of predicting the dynamics of metabolites in all cases, even in those where theoretical knowledge is not fully reflected on the data. This is the case for the time series of LA in the data reported by Camu et al. [4], where its concentrations seem to stay steady after 48 h of fermentation, which would contradict its consumption by AAB as reported by Pereira et al. [30].   96  72  48  24  0  144  120  96  72  48  24  0  144  120  96  72  48  24  0  144  120  96  72  48  24  0  144   120  96  72  48  24  0  144  120  96  72  48  24  0  144  120  96  72  48  24  0  144  120  96  72  48 (table 3). In the following paragraphs, our results will be structured in accordance to their type as (i) maximum specific growth rates, (ii) substrate saturation constants, (iii) mortality rates, and (iv) yield coefficients.

Maximum specific growth rates
With respect to the maximum specific growth rates, the estimated values mostly fall within their reported values in the literature, with the exception of three estimates belonging to particular trials. Specifically, for the maximum specific growth rate of Y on Glc (m Y Glc max ), the obtained parameter values ranged from 0.06 to 0.37 h 21 . These values agree with the reported ones for species of this microbial group, between 0.0781 and 0.53 h 21 [44][45][46][47][48][49][50][51] for Camu et al. [4] and box 2 [24] (green cells in table 3), while the estimate for box 1 [24] is far from the range in less than the estimate divided by 10 (orange cell in table 3). In a similar fashion, the estimated parameters for the maximum growth rate of LAB (m LAB max ) varied across the three datasets between 0. 36

Substrate saturation constants
About the substrate saturation constants, here denoted by K, their estimated values for the three datasets agree with reported ones in the literature in one out of three instances. For these comparisons, in several occurrences a unit transformation was necessary from their original units in which they were reported to milligrams of substrate per millilitre of medium (mg(substrate) ml 21 ), and assuming that one gram of pulp is equivalent to one millilitre of medium, since our estimates are given in mg(substrate) mg( pulp) 21 .   Table 3. Parameter estimates of the cocoa bean fermentation model using a Bayesian estimation framework. Means and standard deviations (in parenthesis) of the posterior distributions for the data of Camu et al. [4] and the fermentation boxes 1 and 2 of Papalexandratou et al. [24] are shown in columns Camu, P. box 1 and P. box 2, respectively. Additionally, the absolute value of the standardized mean difference (effect size jdj) between the estimates of P. box 1 and P. box 2 is shown. Effect sizes marked with (*) are significant (jdj .  21 , respectively. For K AAB LA , this considerably higher value for the upper limit of the range was obtained for the Camu et al. [4] data. This inflation of values was observed for other parameters, i.e. AAB-to-EtOH yield coefficient (Y EtOHjAAB ), AAB-to-LA yield coefficient (Y LAjAAB ) and AAB-to-Ac from LA yield coefficient (Y LA Ac jAAB ), of this dataset as well. and Papalexandratou et al. [24]. In the latter, the only estimate that differed much between box 1 and box 2 was the one corresponding to k Y .

Yield coefficients
Finally, a higher variability among the obtained parameter estimates was found for the yield coefficients, Y. These differences were notable between the two studies of Camu et al. [4] and Papalexandratou et al. [24], as well as the two trials (boxes 1 and 2) of the latter. In more detail, the estimated yield coefficient of Y-to-EtOH from Glc (Y Glc EtOH j Y ) was the only one that agreed in all three datasets with those reported in the literature (green cells in table 3). Their estimated values ranged between 7.44 and 11.94 mg(EtOH) mg(Y) 21 , falling in the referenced range of 1.39 -21.49 mg(EtOH) mg(Y) 21 [46,49,50,68]. For the yield coefficient of LAB-to-Glc (Y Glcj LAB ), the fits corresponding to the data of Camu et al. [4] and box 1 [24] were contained in the referenced range of 1.56 -66.67 mg(Glc) mg(LAB) 21 [59,67] (green cells in  table 3) with values of 29.23 and 20.22 mg(Glc) mg(LAB) 21 , respectively. The remaining estimate Y Glcj LAB for box 2 was far from the reported range in less than the estimate multiplied by 10, with a estimated mean of 3.32 mg(Glc) mg(LAB) 21 (orange cell in table 3).
By contrast, there are estimated yield coefficients that do not agree completely with previously reported values. On the one hand, the estimated value of 33.4 mg(Glc) mg(Y) 21 (green cell in table 3) for the yield coefficient of Y-to-Glc (Y Glcj Y ), in the dataset of Camu et al. [4] only, agree with the ranges of 1.56-66.67 mg(Glc) mg(Y) 21 [44][45][46]50,51,65,66]. Their counterparts from boxes 1 and 2 reported by Papalexandratou et al. [24], are away from the reported range in less than the estimate multiplied by 10 (orange cells in table 3 Moreover, the estimated yield coefficients for AAB-to-LA (Y LAj AAB ) do not agree in any of the data collections with the reference value of 7.94 mg(LA) mg(AAB) 21 [63] with values far from the reference more than 10 times the parameter for the datasets of Camu et al. [4] and box 1 of Papalexandratou et al. [24] (red cells in table 3) and one value far from the reference in less than the estimate divided by 10 for box 2 (orange cell in table 3).
Special attention needs to be given to the values of Y EtOHj AAB , Y LAj AAB and Y LA Ac j AAB for the data of Camu et al. [4], which showed inflated values that are not biologically plausible.
The effect of the measurement errors is discussed in electronic supplementary material, S6.

Statistical comparison of fermentation trials
The statistical comparison of the parameter estimates between the two fermentation trials conducted by Papalexandratou et al. [24] showed that significant differences exist among them (table 3), even though these were done under slightly different conditions in the same region. In this respect, the parameter estimates that showed such a significantly large difference depending on which trial they were derived from, correspond to m Y Glc max , m YFru max , K AAB LA , k Y , Y GlcjY , Y GlcjLAB , Y EtOHjAAB and Y LAjAAB . In the comparison of all these parameters, the computed absolute value of the standardized mean difference (effect size d) was greater than the threshold of 1.2 suggested by Sawilowsky [43]. Such a result leads to hypothesize that minor changes in both, methodologies and regions, affect the parameters of the model as it will be addressed in the discussion section.

Model fitting
As shown in the aforementioned results, our current model for cocoa bean fermentation is capable of reproducing each of the datasets with high accuracy. This means that the mechanistic assumptions made here are in accordance with the available biological knowledge to a considerably good degree, as it has been reflected in the conducted simulations. In this sense, our model represents a mechanistic approach which allows for a deep understanding of the transient responses of the process in a dynamic way, as opposed to current metabolic flux analyses [15,16] that assume steady-state metabolic conditions. Moreover, it represents a fully working kinetic model as opposed to a previous attempt which is capable of simulating metabolites and products time-courses only [17].
However, a detailed analysis of the resulting fits provides insight into the validity of some of the regulatory assumptions underlying the model, the relevance of additional effects not included in the present version of the model, as well as differences between the experimental set-ups behind the datasets.

Regulatory assumptions
By analysing the parameter estimates obtained here, important features of the model can be explored for its enhancement in future iterations. A Bayesian framework for parameter estimation, as used here, provides a scheme to investigate their uncertainty and determine their possible uniqueness. Therefore, it serves as a descriptive source for deriving the plausibility of the regulatory assumptions of the model.
In this sense, one of the particularities of the fitted model is the presence of strongly elevated estimates for the data of Camu et al. [4]. Specifically, the parameters showing such values were: (i) the substrate saturation constant K AAB LA and (ii) the yield coefficients Y EtOHjAAB , Y LAjAAB and Y LA Ac jAAB . Looking at the standard deviations (table 3, electronic supplementary material, S4, figures S9-S11) of their corresponding posterior distributions, it can be noted that there is a huge uncertainty in their values (large errors). These uncertainties reveal that the parameters cannot be uniquely estimated from these particular data, suggesting a practical non-identifiability of the parameters with the data reported by Camu et al. [4]. The reason for this characteristic can be threefold. Firstly, noise in the experimental data prevents a unique determination of the model's parameters because of an insufficient signal-to-noise response [70]. Secondly, the data may be incongruent particularly with the model mechanisms of growth of AAB on LA and the interactions AAB-EtOH. Finally, the estimated parameters might be correlated.
In our opinion, all elevated parameters related to the growth of AAB on LA, i.e. K AAB LA , Y LAj AAB and Y LA Ac j AAB , can be a result of noise in experimental data. Thus, as revealed by visual inspection of figure 3d where the real data does not reflect an accentuated decrease in the concentration of LA as opposed to the data reported by Papalexandratou et al. [24] (figures 4d and 5d), where such decrease exists after 72 h of the fermentation process. For the remaining elevated parameter which is related to the consumption of EtOH by AAB (Y EtOHjAAB ), a straight interpretation of its value of %1300 mg(Glc) mg(Y) 21 would imply that 1300 mg of EtOH are consumed by 1 mg of AAB; or in other words, that for generating 1 mg of AAB, 1300 mg of EtOH is required. Obviously, a yield coefficient of this order of magnitude is biologically implausible and for this reason it could be argued that the proposed model is not entirely capturing all the inherent mechanisms of the AAB-EtOH interaction in the fermentation process. A possible explanation of this specific inflated parameter value, is that Y EtOHjAAB is not only capturing the consumption of EtOH by AAB, but also possible physical processes such as evaporation of this rsos.royalsocietypublishing.org R. Soc. open sci. 5: 180964 metabolite. Temperature data, so far not implemented in the model, gave the reason for this hypothesis. More precisely, the data of Camu et al. [4] shows higher temperatures of the fermentation mass much earlier in the process compared to the data reported by Papalexandratou et al. [24]. In the first case, a temperature above 358C was reached right after 30 h and its maximum of %458C at 70 h of the fermentation process. In the latter, similar temperatures were reached at 40 and 80 h of the process. This disparity might explain why the estimated parameter values for Y EtOHj AAB in the Papalexandratou et al. [24] data collection are between 3 to 5 times less compared to the value obtained for the data of Camu et al. [4]. Finally, correlation between the estimated parameters might be playing an important role in the non-identifiability of these inflated parameters. In other words, interdependencies between different sets of parameters limit the MCMC sampler to freely explore the solution space. In that sense, from a simple correlation analysis (see electronic supplementary material, S7), we did not find remarkable patterns among the posterior probabilities of the parameter estimates, not even in the inflated ones.
According to these hypotheses, further iterations of the model should include additional physical effects, especially temperature. Moreover, pH conditions during the conduction of the fermentation should be also taken into account, as well as a non-dimensionalization of the model to identify correlated estimates and reduce their number.

Parameter conformance to values in the literature
The parameter ranges indicated as reported values in table 3 are in many cases referring to different experimental conditions and/or a specific microorganismal strain and might therefore not be directly comparable to the biological situation discussed here. We resorted to these values, whenever we failed to identify parameter values directly applicable to cocoa bean fermentation, in order to at least provide an order-of-magnitude estimate.
Hence, among the different estimated parameters, few did not agree with previously reported values in the literature as pointed out in §3.3. For the substrate saturation constants K Y Glc and K Y Fru , such a difference can be explained by the fact that the reported values correspond to the growth of Y under aerobic conditions, as opposed to the anaerobic earlier stage of cocoa bean fermentation where Y's growth takes place. Consequently, higher values for these parameters as reported here, reflect a slow uptake rate of Glc and Fru by Y under anaerobic conditions. Finally, from a general point of view, the various discrepancies between the obtained parameter estimates for the growth rates and yield coefficients with their values reported in the literature for single species confirms the high growth rates and yields coefficients that mixtures of microorganisms might show in fermentation processes [71].

Comparison of parameter estimates
As mentioned in §2.1, the study performed by Papalexandratou et al. [24], involved two fermentation trials conducted in two cocoa-producing farms in Brazil belonging to the same region. These trials denoted as 'box 1' and 'box 2', differed between each other in minor aspects. After we identified eight significantly different estimates between these trials (table 3), in the following paragraphs, three main differences were taken into account in order to formulate hypotheses on how the parameter estimates are affected when applying the same fermentation method, i.e. wooden boxes, under similar environmental conditions in distinct fermentations. The three main differences between the trials are: (i) initial concentration of microorganisms, (ii) concentration ratios of initial substrates, and (iii) evaporation rates.

Initial concentration of microorganisms
The initial concentrations of microorganisms between boxes 1 and 2 did not differ much, with the exception of Y. For Y, the initial concentrations in boxes 1 and 2 were equal to 0.18 and 0.005 mg g 21 , respectively. This difference as well as the microbial diversity that has been seen along different fermentation trials [2] determined that the estimates of these growth rates differ between each fermentation trial; with higher values of the growth rates m Y Glc max and m YFru max for box 2 than for box 1. A similar effect is evident in the yield coefficient related to the growth of Y on Glc (Y GlcjY ), where box 1 showed a higher value than the one obtained for box 2 as a consequence of the higher initial amount of Y in box 1. In other words, a higher initial concentration of Y determines the estimation of lower maximum specific growth rates as well as higher estimates for the yield coefficient of Y on Glc. This can be explained that given a high initial microbial population, it needs a lower cell division rate to reach the maximum described by the observed data and an increased rate of uptake of its substrate.

. Concentration ratios of initial substrates
Worthy of attention, was the difference in the initial concentrations of the main substrates Glc and Fru which might play an important role in the growth of Y and LAB. For box 1, the initial concentrations of Glc and Fru are 55.482 and 49.669 mg g 21 , respectively; while for box 2, these are 42.936 and 67.249 mg g 21 , respectively. This ratio would explain the significant difference in the yield coefficient of the growth of LAB on Glc (Y GlcjLAB ) between boxes 1 and 2. In box 2, this yield coefficient was estimated significantly lower than for box 1 which leads to the hypothesis that this phenomena might be the result of a growing population of Y restraining the access to Glc to the LAB microbial group. This final hypothesis coincides with our assumption of resource-type competition between Y and LAB [34,35], which determined a less successful Y Glcj LAB for LAB under a lower initial concentration of its main substrate Glc in box 2.

Evaporation rates
Among the subtle differences between boxes 1 and 2 reported by Papalexandratou et al. [24], the final one to be considered is the possible uneven evaporation rates between them. With this in mind, it is not deceitful to expect a higher evaporation rate of volatile metabolites in a fermenting mass protected only by a metal roof, as reported for box 1, than in a fermenting mass held within a fermentary room, as reported for box 2. In this respect, the higher estimated yield coefficient of growth of AAB on EtOH (Y EtOHjAAB ) in box 1 than in box 2, might be explained for a possibly greater evaporation rate of EtOH in box 1, similar to the possible explanation of its inflated counterpart determined for the Camu et al. [4] dataset. A similarly higher evaporation rate for LA and Ac in box 1 could be the explanation for the differences among the remaining parameter estimates, i.e. the Contois substrate saturation constant for the growth of AAB on LA (K AAB LA ), the yield coefficient of consumption of LA by AAB (Y LAjAAB ) as well as the large variances of the yield coefficients of the production of Ac from EtOH and LA by AAB (Y EtOH Ac j AAB and Y LA Ac j AAB , respectively). This hypothesis can be also extended to the lower value in the mortality rate term of Y (k Y ) observed in box 1 than in box 2, where EtOH loses its effect on decreasing Y's population due to a higher evaporation. In other words, the mortality rate of Y might be lowered in the presence of an increasing evaporation rate of EtOH in the fermenting mass.

Conclusion
The model presented here is a first biochemically plausible, ODE-based kinetic model of cocoa bean fermentation capable of reproducing the known sequential activation of microbial communities and capable of fitting available experimental data to an acceptable degree. However, it is necessarily a simplification of the diverse biological processes involved in cocoa bean fermentation. The remaining discrepancies between model prediction and experimental data, as well as those parameter values outside the biologically plausible ranges, point to the fact that relevant aspects of the processes have not been taken into account. Based on the model features, we can hypothesize that the following regulatory mechanisms might exist: (i) resource-type competition between Y and LAB, (ii) microbial death is determined to a good degree by direct action of fermentation products upon their respective producing microorganisms and (iii) chemical and physical factors intervene in the decrement of volatile products, i.e. EtOH and LA, rather than microbial activities only.
This mathematical model allows relating observed microbial population sizes and concentrations of the five chemical compounds considered here, i.e. Glc, Fru, EtOH, LA and Ac, during the time course of fermentation with growth rates, mortality rates, substrate saturation constants and yield coefficients as intrinsic systemic parameters.
Additionally, the capability of the model to 'reverse-engineer' differences from the observed time courses of two trials conducted in the same region under the same methodology showed how these systemic parameters might be affected by minor changes between one and other fermentation trial. The cocoa and chocolate markets require a steady flow of high-quality raw material resulting from diverse types of fermentation. Although fermentation practices will remain locally determined, modelbased recommendations to farmers on practices and use of specific starter cultures might help increase the quality of cocoa bean raw material prior to shipment. This will ultimately increase the sustainability of cocoa bean supply. Subsequent versions of the model should include additional chemical and physical effects, such as temperature and pH dependence of kinetic parameters, the spatial heterogeneity of a fermentation pile, impact of additional (commonly occurring) microorganisms, as well as a further compartmentalization including the inner bean and the incorporation of sucrose as an additional carbon source, serving also as a (time-delayed) source of glucose and fructose.
Besides the extension by further chemical and physical effects, eventually such a kinetic model needs to be interfaced with the more microscopic, metabolic perspective put forward in other studies [15,16]. Should high-quality genome-scale metabolic models be available, flux balance analysis [72] may provide a suitable theoretical framework for such an approach on three levels: (i) A metabolic pathway analysis of synergies and competitions (e.g. using the methodology from [73]) may point to additional modes of interaction among the species involved. (ii) A detailed exploration of the emerging pattern of chemical compounds as a function of the fermentation time course may become feasible by incorporating the biochemical interactions of the cocoa bean and the microorganisms. (iii) The relevance of a larger diversity in microorganisms (e.g. different yeast strains) can be assessed. With the availability of genome-scale metabolic models currently developing rapidly [74], we expect this avenue of research to become feasible in the very near future.
The recent finding [75] about the metabolic interplay of yeast and LAB is an example of the richness of this metabolic foundation underlying the dynamics leading to the successful fermentation of a cocoa bean.