Misuse of Beer–Lambert Law and other calibration curves

Calibration curves allow instrument calibration by predicting the concentration of an analyte in a sample from the reading of the instrument. This curve is constructed as the regression straight line that best fits the relationship between some known concentration standards and their respective instrument readings. An example is the Beer–Lambert Law, used to predict the concentration of a new sample from its absorbance obtained by spectrometry. The issue is that usually this methodology is misapplied. In this paper, we want to clarify this point, explaining what the error consists of and how (easily) to fix it, with the intention of ensuring that it does not continue to be reproduced in the experimental scientific work.


Introduction
Instrument calibration involves the construction of a calibration curve that allows to predict the concentration of an analyte in a sample from the reading of an instrument. This curve is the linear regression model that 'best fits' the relationship between some known concentration standards and the respective instrument responses. Of course, the effectiveness of the calibration procedure will depend on whether the relationship between the concentration and the instrument reading is indeed (approximately) linear. If it is, bivariate regression may be used to address the issue of predicting the output or dependent variable, say Y, from the input, regressor or independent variable X, by fitting a straight line to a scatterplot of observations on both variables, with the values of the variable X on the x-axis (abscissa), and those of the variable Y on the y-axis (ordinate). The best straight line, in the sense of minimizing the sum of the squared errors of prediction has the expression A paradigmatic example is the very popular Beer-Lambert Law (also known as Beer's Law), which establishes that under ideal conditions, the absorbance of a solution of an absorbing substance that is obtained by spectrometry techniques is directly proportional to the substance's concentration. This implies that the increase of the concentration value gives an increasing value of the absorbance, which is due to the fact that a high concentration of solution absorbs more light compared with a low concentration and that this happens in a linear way. This relationship between absorbance and concentration is used not only by chemists, but by experimental scientists of many other disciplines. Details of what this law says are given in §2.
There are innumerable works that collect research in which Beer's Law has been applied in very diverse fields that use the technique of spectrometry. Just to mention a few of them: in [1] the authors obtain the absorbance of some samples of glucose extracted from three different types of fruits peel wastes using UV-Vis spectroscopy, and from it and by means of Beer's Law, they obtain the corresponding concentrations, comparing between them. In [2] the authors say verbatim that 'The significance of Beer-Lambert Law is to measure the absorbance of a particular sample and to infer the concentration of the solution'. They use a spectrometer for measuring the absorbance of three macronutrients that are essential for plant growth (nitrogen, phosphorus and potassium) and are commonly used in fertilizers, in non-agriculture soil. As the quantity of fertilizer has to be estimated based on the requirements for optimum production, they apply the Beer-Lambert Law to determine the nutrients concentrations. Paper [3] explains a study for the determination of the amount of manganese metal present in tricalcium phosphate using flame atomic absorption spectrophotometer to observe the corresponding absorbance, by means of the calibration curve. The authors of [4] carry out an experiment to introduce a method to estimate the amlopidine in pure drug and marketed tablet Formulation consisting in the use of a calibration curve derived from Beer's Law to obtain the concentration from the absorbance. Andriamahenina et al. [5] investigate the effect of the presence of outliers in the calibration of lead by graphite furnace atomic absorption spectrometry, concluding that the presence of outliers worsens the quality of the measurement of the concentration of lead obtained from the absorbance given by the instrument reading, by using the calibration curve. A non-invasive alternative of blood glucose monitoring is introduced in [6], based on the detection of the optical density of the solution samples by means of a spectrophotometer, and then converting it into the corresponding glucose concentration by using the Beer-Lambert Law, with the help of a concentration curve. In [7] Ocean Optics Ocean View spectrometer operating software is used to obtain and process data from spectrometer, and get the transmittance (then, the absorbance) of a uric acid solution, from which to calculate uric acid concentration by using a concentration curve. The authors of [8] present and validate a quick and sensitive spectrophotometric method for quantitative determination of gliquidone in bulk drug, pharmaceutical formulations and human serum, based on the absorbance readings and their transformation into concentration through a calibration curve of the absorbance over the concentration. Restrepo et al. [9] report an easy methodology to construct handmade solar cells to produce clean energy from chlorophyll-a (chl-a) extracted from the leaves of Diacol Capiro potato. A spectroscopic calibration curve was constructed using different chl-a standard solutions and their absorbances. In [10] a quality-by-design (QbD) approach was implemented for the routine quality control analysis of serotonin in pharmaceutical dosage form through a spectroscopic method, by using a calibration curve of the absorbance over the concentration.
Although very common, Beer's Law is not the only source of application of calibration curves in different fields. For example, in the very recent paper [11] the authors construct calibration curves for the total protein eluted from membranes with respect to the concentrations of Bevacizumab or Trastuzumab used to add to serum employed to load the membranes. The total protein eluted from membranes is determined by measuring native fluorescence and then the concentration of Bevacizumab or Tratuzumab is determined using the calibration curve.
The problem of the proper use of calibration curves is common to many engineering and science applications, but not much attention has been paid to it from Statistics, with some exceptions (see ch. 15 in [12], for example, and references therein). The objective of this work is to show simply and without too many technicalities, in an accessible way to engineers and experimental scientists, the misuse of the calibration curves, explaining how to (easily) correct this pitfall, that could result in undesirable consequences. This issue has been treated before, although not always with the same success (see details in §4), but it is still worth reporting and publicizing to ward off further spreading among experimental scientists. Probably, in most cases this error has not practical importance and does not invalidate the published studies, since there will be little difference between the results obtained using the wrong calibration curve (classical calibration), and those obtained using the proper royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211103 2 one (inverse regression). Nevertheless, this does not prevent the error from being worth noting, for three main reasons: (a) because regardless of the practical implications, from a conceptual point of view, the statistical methodology must be used in the appropriate way; (b) because a priori it is not possible to know the extent of the repercussions of the misuse of the calibration curve on the results of an experiment; (c) because an error does not cease to be so even though it is very generalized and commonly accepted.
The organization of the rest of the paper is as follows: in §3 we explain the misuse of the Beer-Lambert Law and other calibration curves. Section 4 details how to fix this problem, and a toy example of calibration is developed in §5 to show how the two calibration curves are applied, and compare them. Section 6 includes a few words in conclusion and an outline of what calibration curve is appropriate in every situation in figure 6. Finally, in appendix A we recall the main formulae of the linear regression model, and in appendix B we show two more examples of calibration, one with real experimental data and the other using simulation.

The Beer-Lambert Law
A spectrophotometer is an instrument that measures the number of photons delivered by a solution of a chemical species that absorbs light of a particular wavelength in a given unit of time, which is called the intensity, allowing to compare the intensity of the beam of light entering the solution (I 0 ) with the intensity of the beam of light exiting it (I). The ratio of these intensities is called transmittance, and is denoted by the letter T. That is, T = I/ I 0 . If the transmittance is a measure of the quantity of photons passing through a solution (the proportion of the intensity of the light entering the solution that exits), the absorbance A is a measure of how much light is absorbed by the solution, and is defined as a function of the transmittance in this way, A ¼ À log 10 ðTÞ, ð2:1Þ (large values of absorbance are associated with very little light passing through the solution, and on the opposite, small values of absorbance are associated with most of the light passing entirely through it). When passing a beam of light of the appropriate wavelength through the solution, if it is fairly dilute, the photons will encounter a small number of the absorbing chemical species and then we can expect a high transmittance and low absorbance. On the contrary, if the solution is highly concentrated we will expect a higher number of the absorbing chemical species and a low transmittance and high absorbance. This leads us to think that the absorbance could be a monotonic increasing function of the concentration of the solution, and even that it could be (directly) proportional to it. As well, it seems that the absorbance would increase if the beam of light goes through the solution for a longer period of time, and since the speed of light is constant, we could think that the absorbance is also directly proportional to the path length of the beam through the solution. In this way we come to the (deterministic) Beer-Lambert Law, which states the following: The Beer À Lambert Law : where c is the concentration of the absorbing species in the solution, L is the path length of beam through the sample compartment where the solution is, and ɛ is the proportionality constant. If the path length L is reported in centimetres (cm), and the concentration c is reported in molarity (moles per litre, mol l −1 ), the proportionality constant ɛ is called the molar absorptivity or molar extinction coefficient, and has units litres per mole-centimetre (l (mol × cm) −1 ). In this way, when multiplying ɛ, L and c, all the units cancel and as such, it follows that absorbance A is unit-less. Note that ɛ is intrinsic to the absorption of the solution of chemical species at a particular wavelength of light. If, in a given context, we know three of the four quantities that appear in equation (2.2), we can solve for the value of the fourth. We could obtain the absorbance of a solution A from its concentration c, knowing the other two quantities L and ɛ, without needing more to substitute in (2.2). Or vice versa, knowing the absorbance of the solution at a given wavelength, usually from the transmittance, by using equation (2.1), we could obtain the concentration by solving c from equation (2.2), The crux of the issue appears when the product of the molar absorptivity and the path length, ɛ L, which is constant for a given solution (ɛ) and as long as the same sample compartment is used to make measurements (L), is not known. Then, in order to determine the concentration c of the solution given its absorbance value A, a calibration curve needs to be constructed. And it is at this point that the source of the error appears, as will be described in the next section.

Misuse of the calibration curves
What is this widespread error? In the context of lack of knowledge of the (constant) value of ɛ L, the following misuse of the Beer-Lambert Law is usually committed: in order to construct the calibration curve to predict the concentration of an unknown solution from its known absorbance, a set of standard concentrations within the range of the measuring instrument are prepared, and the corresponding absorbances are determined by spectrometry, say (c 1 , A 1 ), (c 2 , A 2 ), …, (c n , A n ). Then the equation of the regression straight line for the response variable absorbance and prediction variable (or regressor) concentration that best fits these n points is where β 1 is the slope of the line, and β 0 is the y-intercept, and both are obtained from the n points by means of the linear least-squares method and are given by formulae Now, if we denote by b A i the prediction of the absorbance given by the straight line for a solution whose concentration is that corresponding to the ith point, c i , it is obtained by substituting c i into equation (3.1), then the difference (error) between the predicted and the observed absorbance for the solution with concentration c i is: A i , and formulae in (3.2) are obtained imposing that the sum of the square of the errors be minimum That is, if absorbance A is plotted versus concentration c for the series of n known solutions with the dependent variable A placed on the y-axis, and the independent variable c graphed on the x-axis, the calibration curve (3.1) is the straight line that best fits the n points in the plane in the sense of minimizing the sum of the squares of the distances from each point to its prediction vertically (figure 1). Calibration curve (3.1) is therefore intended for predicting the absorbance of new solutions for which concentrations are known, since with the parameters β 0 and β 1 given by (3.2), it ensures that the sum of the square of the errors committed in prediction for the n initial solutions is as low as possible. Then, given the concentration of a new solution, say c 0 , we can obtain the predicted absorbance value for it, b A 0 from equation (3.1) by substituting the concentration c 0 , that is b However, in what is known as classical calibration, (3.1) is usually used inappropriately to predict the concentration of new solutions for which absorbances are known in the following way: first finding the y-value on the regression straight line corresponding to the measure of the absorbance, and then tracing downward to see which concentration matches up to it, and this value will be the predicted concentration of the solution with that absorbance (figure 2b).
That is, given the absorbance value of a new unknown solution, say A 0 , the usual (wrong) practice is to obtain the predicted concentration value for it, b c 0 , from equation (3.1) by substituting the absorbance A 0 , that is where b = −β 0 /β 1 and m = 1/β 1 , being β 0 close to zero, and β 1 an estimation of the unknown product ɛ L, both computed using the formulae in (3.2). If we predict the concentration for the ith point given its royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211103 absorbance in this (wrong) way, we obtain But then, the sum of squared errors (differences between observed and predicted concentrations) is and we do not have any optimality result in the sense that we cannot ensure that it is as small as possible, with β 0 and β 1 given by (3.2), unlike what happens with (3.3). In summary: it is possible algebraically to predict the concentration from the absorbance by using the calibration curve of the absorbance A over the concentration c given by (3.1), following the expression (3.4) with β 0 and β 1 given by (3.2), as in figure 2b. This is the classical calibration approach. But this is not the optimal way, since we do not control for the prediction errors that are committed. Therefore, this procedure should be avoided. Instead, it is advisable to preserve (3.1) exclusively to predict the absorbance from the concentration, because this procedure is optimal to achieve the minimum sum of the squared prediction errors (figure 2a).

Easily fixing it
The problem is easily solvable: since it is a question of constructing a calibration curve to predict the concentration of a new solution of which the absorbance is known, from the concentrations and absorbances of the initial known solutions, the regression straight line of the concentration c over the absorbance A will be the proper one to be used, since it is the one that minimizes the sum of the squared errors of prediction (ordinary least squares, OLS). From the known concentrations and absorbances of the set of n solutions, we obtain the equation of the regression straight line for the Figure 1. Calibration curve of A over c properly used to predict absorbance A from concentration c. The error of prediction is with the slope α 1 , which is an estimation of (ɛ L) −1 , and the intercept α 0 (close to zero) obtained from the formulae Given the absorbance corresponding to the ith point, and then the difference (error) between the predicted and the observed concentration for the solution with absorbance A i is: 1 i ¼ c i À b c i , and in these cases formulae in (4.2) are obtained imposing that the following sum of the square of the errors be minimum: (see figure 3). Note that the two straight lines (4.1) and (3.1) intersect at the point ð c, AÞ. Given the absorbance of a new solution, say A 0 , we can obtain the predicted concentration value for it, b c 0 from equation (4.1) by substituting the absorbance A 0 in this direct way and if we compare (4.4) with (3.4) we realize that in general, a 0 = b and a 1 = m, that is, the two approaches are not equivalent, as can be seen graphically in figure 4.  royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211103 Since we are interested in minimizing the sum of the squared errors of prediction, it is then evident that the proper calibration curve is (4.1) and not (3.1). This approach is known as inverse regression from [13]. It is perfectly adequate in terms of prediction errors, since the OLS method does not depend on any additional hypotheses about the regression model, being the optimal approach in the sense of minimizing the sum of the squared errors.
However, it is true that to make statistical inferences about the linear regression model (confidence intervals and tests of hypothesis on the coefficients of the regression straight line), some hypotheses are assumed (see appendix A for details), being the most basic that the regressor is measured without error, and that the response variable is randomly distributed following a normal distribution with mean a linear function of the regressor, and constant variance. We will call them: LR hypotheses (by linear regression). If we are interested in making statistical inferences about the regression model, we have to design the experiment to collect data in such a way that these assumptions are reasonably fulfilled. In our case, this means that absorbances have to be measured with precision while concentrations are measured with non-negligible error, which in practice may not be possible, and this is considered in the literature the weak point of the inverse regression approach. Indeed, in the opinion of Parker et al. [14], for example, the observed measurements (absorbances) in practical calibrations are subject to measurement error, violating the LR hypotheses.
What if the LR hypotheses with regressor the absorbance and the concentration as response, corresponding to the approximation of the inverse regression, are not fulfilled, not even roughly? Nothing invalidatesthis approximation, in our opinion, for the following reasons: (1) The hypotheses are needed if we want to make statistical inference about the model, not to make predictions, that can be carried out equally. (2) The convenience of using the inverse regression approach relies on OLS, which does not depend on any hypothesis but on the errors of prediction, which allow to evaluate the predictive capacity of any model. (3) The greater predictive power of the inverse regression, compared with that of classical calibration, gives support to its use and has been shown empirically in this work by a toy example in §5 and two more examples in appendix B, one with real experimental data, and the other built by simulation.
Likewise, it has also been described in some works. In this regard, [13] compared classical calibration (named there Method A) and inverse regression (Method B) using simulations, and recommended the latter based on the mean squared error. The authors of [14] also arrived at the same conclusions through some simulation studies (see also references therein in the same vein), although other authors criticize that recommendation. For example, in the recent paper [15], the authors introduced a new methodology, the 'reverse inverse regression' to address the same problem, assuming that the inputs (concentration values) vary according to Gaussian distributions, which allow them to derive some statistical properties, and criticize the inverse regression approach based on the treatment of the inputs (absorbance values) as determined with small error. But they compare their method against classical calibration and inverse regression using a simulation study, and have to recognize the best behaviour of the latter in the sense of minimizing the variance of the prediction interval.
In brief, leaving aside assumptions that could, or not, be accomplished (that in case to be fulfilled allow to deduce some statistical properties for the linear regression model), if we are interested in prediction, the best approach nonetheless seems to be inverse regression.

A toy example
We prepare a set of n ( = 10) standards within the range of the measuring instrument, with the following made-up values of concentration (in mg l −1 ) and absorbance, recorded in table 1.
The two calibration curves given by (3.2) and (4.2) are: Classical calibration (curve of A over cÞ : We can observe in figure 5 that indeed, as explained above, the two curves are not the same, and they cut at the point ð c ¼ 110, A ¼ 0:04884Þ. Moreover, the values of the R-squared (R 2 ) have also been reported for the two calibration curves, being higher than that of the inverse regression approach to royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211103 predict concentration from absorbance. R 2 represents the proportion of variation in the response variable that is explained by the calibration curve (the higher the better).
Note that R 2 = 1 − (SSE/SST), where SSE and SST denote the sum of squared errors and the sum of squared total, respectively, that is, SSE ¼ P n i¼1 ðc i À b c i Þ 2 and SST ¼ P n i¼1 c 2 i À n ð cÞ 2 , being b c i the prediction for the concentration of the ith solution (with absorbance A i ), that is given by (3.5) for the classical calibration approach, but by (4.3) for the inverse regression. In table 2, we report the predictions b c i with the two approaches.
As expected, the proper calibration curve (that of c over A) has lower standard error (s.e.) and higher R 2 than the usual one (the calibration curve of A over c), to predict concentration from absorbance, which confirms the theoretical result that states that it is better. In other words, inverse regression is better than classical calibration in the sense of minimizing the sum of squared errors in prediction, and this conclusion is independent of the hypotheses of the linear regression model.
One way to see if the differences in prediction errors are statistically significant is as follows: consider the differences of the absolute value of the prediction errors with the two approaches (last column in  royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211103  table 2). For this sample of size 10, we can perform a goodness-of-fit test for normality (Shapiro-Wilk test) obtaining a p-value of 0.915, which does not allow us to reject the hypothesis of normality, so we apply the one-sided t-test to compare the mean against 0, giving a p-value of 0.07094 Ã . This p-value is not less than 0.05 but it is not very far off (it is less than 0.10), so we can say that there is a slight statistical significance in favour of the difference of the absolute values of the predictive errors being positive, or what is the same, that on average the errors with the classical calibration approach are Table 2. Toy example: predictions with the two methods: classical calibration and inverse regression, and corresponding prediction errors with the difference of the absolute value of the errors. In italics the maximum R 2 and the minimum standard error (s.e.), as well as the p-value for the one-sided t-test in favour of the hypothesis that the mean of the differences is greater than 0.  royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211103 greater in absolute value than with the inverse regression. Since in practical calibrations the errors in making the predictions are of the most important measures of the goodness of the calibration method, in table 3 we also record the values of the radius of the prediction intervals.
For any absorbance A i , the corresponding prediction intervals are of the form b c i + ðaÞ using the classical calibration (the expression for (a), which has been derived with the approximative Delta method, can be found in (A 6), appendix A), and b c i + ðbÞ with the inverse regression, where by (A 5) in appendix A, ðbÞ ¼ t nÀ2 Note that both (a) and (b) in table 3 are deduced from the assumptions of the linear regression model; therefore, they will be more or less adjusted, depending on the degree of compliance with the LR hypotheses. In any case, for all absorbance values, the estimated radius of the prediction interval is greater with the classical calibration than with the inverse regression. This fact is statistically significant: if the two methods were equivalent from the perspective of the prediction interval error, or if the classical calibration were better, the probability that for the 10 absorbance values the prediction interval radius with the inverse regression are all less than the corresponding with the classical calibration, is upper bounded by which is a very low p-value (corresponding to the exact binomial test). This means that the probability that the 10 prediction interval radius with the inverse regression are less than the corresponding with the classical calibration if the first method is not better than the second in the sense of having less prediction error, is very low, which reveals that the assumption must be rejected, and accepted that inverse regression is statistically significantly better than classical calibration. The same conclusion is reached by performing a statistical one-sided t-test to compare the mean of the differences (a)-(b) with 0, with a p-value of 1.998 × 10 −12ÃÃÃ in favour that the mean is greater than 0 or, equivalently, that on average, the radius of the prediction intervals for the classical calibration are greater than for the inverse regression. The t-test is performed after using a Shapiro-Wilk test of normality, whose p-value is: 0.1859.
As a final comment in this toy example, note that the analysis of variance (ANOVA) methodology for regression (see appendix A) can only be applied to the inverse regression approach, and that in this case, the ANOVA SST ¼ P n i¼1 ðc i À cÞ 2 ¼ 33 000 where S AA ¼ P n i¼1 ðA i À AÞ 2 ¼ 0:006104104. Then, if the LR hypotheses hold, the null hypothesis H 0 : 'no linear relationship between A and c' is rejected since the corresponding p-value is P(F 1,8 > 41.28787) = 0.0002035 ÃÃÃ . That is, we accept with a very strong statistical significance that A and c are linearly related. We observe the concordance between values in this ANOVA table and that of table 2. However, this is not true with classical calibration, the other approach. The reason is clear: the values recorded in its ANOVA table (that we have not reproduced here) are that of the regression curve of A over c: A = β 0 + β 1 c when used to predict the absorbance from the concentrations, and not vice versa. For this reason, to compare both approaches, the ANOVA methodology does not turn out to be useful.

Conclusion
There are many very painstaking experimental works in which an analytical methodology to determine the concentration of a given substance by using spectrometry is described. Without trying to undermine the interest of these studies, it is necessary to mention that in them, in a systematic way, a gross error is made in the application of the Beer-Lambert Law that allows to determine the concentration c from the absorbance A. The pitfall consists in using the calibration curve of A over c (classical calibration), which is clearly not an optimal approach (see [13], for example), instead of using the calibration curve of c over A, royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211103 which would be the appropriate (inverse regression), in the sense of minimizing the sum of the squared errors of prediction.
But this not only happens in the application of Beer's Law: it is also a common practice in other contexts where instrument calibration is used, when inexpensive and quick measurements (Y) are related to expensive and time-consuming measurements (X) based on a set of observations, and we are interested in estimating the expensive measurement of X given a new measurement of Y. Instead of use the classical calibration approach, it is advisable, from the point of view of minimizing the sum of squared errors of prediction, to use the inverse regression. A guide on how to get it right is in figure 6.
Even if the LR hypotheses with regressor the absorbance and the concentration as response are not accomplished, the approximation of the inverse regression remains valid: to carry out predictions it is not necessary for the hypotheses to be fulfilled since the inverse regression approach relies on OLS, which does not depend on any hypothesis. Moreover, the greater predictive power of the inverse regression, compared with that of classical calibration, gives support to its use. This fact is founded on the fact that inverse regression minimizes the sum of the squared error of the predictions for the concentration given the absorbance, but it is also shown empirically in this work by a toy example in §5 and two more examples, one with real data and the other built by simulation, in appendix B.
That in the classical calibration approach the LR hypotheses are fulfilled, is nothing more than an entelechy: how to be sure of the normality of the absorbance distribution given the concentration value, which is assumed to be fixed (and determined without error, despite the fact that measurement errors are unavoidable), and of the rest of the hypotheses? Despite the (possible but not usual) utilization of methods for studying the goodness of fit of the observations to them, the assumption of the hypotheses of a model is always a delicate subject that could be considered, in a sense, a matter of faith. Evaluating the predictive capacity of a model by means of the sum of the squares of the errors of prediction is not. Even in the simulation example presented in appendix B, in which the absorbance values have been simulated from those of the concentration, that are fixed, according to the equation of a straight line with an additive Gaussian noise, that is, in such a say that it can be assumed that the LR hypotheses are fulfilled with the concentration as regressor and the absorbance as output variable (classical calibration), from a predictive point of view it turns out that the inverse regression approach surpasses royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211103 the classical calibration. In other words: leaving aside assumptions that could, or not, be accomplished (that in the case to be fulfilled allow to deduce some statistical properties for the linear regression model), if we are interested in prediction, the most appropriate would be to use the inverse regression approach.
It is true that in many applications the difference between the predicted concentrations obtained with both calibration curves is small, and therefore, for practical purposes, this error does not usually have great consequences. However, this does not justify overlooking the entanglement, which is important from a conceptual point of view. What is more, it could potentially have practical consequences, so it should be avoided. This paper aims to draw the attention of experimental scientists to this important issue and contribute to the eradication of this pitfall.
Data accessibility. All scripts used in this study are openly accessible through https://github.com/StochasticBiology/ boolean-efflux.git. The data are provided in electronic supplementary material [20]. I have used simulated data that I have uploaded in a csv format file. Acknowledgements. The author wishes to thank the anonymous referees and Associate Editor for careful reading and helpful comments that resulted in an overall improvement of the paper.

Appendix A. The linear regression model
In this section, we will see the formulae relative to the linear regression model, which is a model to describe the linear relationship between two quantitative variables, namely X, which is the input or regressor, and Y, which is the output or predicted variable. In each scenario, which of the two variables should play the role of X, and which of Y, depends on the objective: the variable that has to play the role of Y is the one for which we want to obtain a prediction given a known value for the other variable (which, then, will play the role of X). This asymmetry between the variables is a factor to take into account, since it could be a source of confusion. Indeed, it is very important to resolve this issue at the beginning, before building the model, since making the wrong decision will lead, as has been explained above that is common in instrument calibration by spectrometry, to predictions subject to greater error, being precisely to highlight and clarify this matter, the motivation of this paper.
The linear regression model of Y over X is a straight line whose equation is the one that better fits the data, which is a set of n > 2 pairs of values of the variables X and Y, say (x 1 , y 1 ), (x 2 , y 2 ), …, (x n , y n ), and is given by where b 0 and b 1 are obtained from the data in this way (Note that the asymmetry between X and Y is reflected in the expressions to obtain the coefficients of the straight line b 0 and b 1 .) In what sense is the regression line the one that best approximates the data? In which it is the one that minimizes the sum of the squared errors, denoted by e i , which are the difference between the observed value of the variable Y when the variable X takes the value x i , which is y i , and the prediction given by the regression straight line, which is b y i ¼ b 0 þ b 1 x i , that is, e i ¼ y i À b y i . If the relationship between X and Y were perfectly explained by the straight line (hypothetical and deterministic situation), then e i = 0 for i = 1, …, n.
By imposing this criterion we can easily find (A 1). This is the well-known ordinary least squares (OLS) method, due to Carl F. Gauss. To apply this method, we must derive royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211103 with respect to b 0 and b 1 , and set these two derivatives to zero. Indeed, we obtain From the first we get and from the second, by substituting the expression obtained for b 0 , we finally have that (further verification that it is indeed a minimum is necessary, although we will not go into details). A value that is used as a measure of how well the regression straight line approximates to the n point, is the determination coefficient or R-squared, being defined by with S yy ¼ P n i¼1 y 2 i À n ð yÞ 2 , which is between 0 and 1 and is interpreted as the proportion of the total variability of the data that is explained by the regression straight line. The closer to 1 is R, the better the linear approximation of the relationship between variables X and Y. Its square root, with the sign of the slope b 1 , is the well-known Pearson's correlation coefficient r ∈ [− 1, 1].

A.1. The hypotheses of the regression model (LR hypotheses)
The regression model assumes that for each fixed value of the variable X, x i (i = 1, …, n), the random variable Y, which is denoted in this case by Y i , has Gaussian distribution with a mean which is a linear function of x i , say γ 0 + γ 1 x i , where γ 0 and γ 1 are parameters independent of i, and with variance σ 2 > 0, which is also a parameter independent of i, that is, we assume that Moreover, we assume that the random variables Y 1 , …, Y n are independent. In other words, where δ 1 , …, δ n are independent and identically distributed random variables, N(0, σ 2 ). These are the LR hypotheses that are needed in order to perform statistical inferences. We assume them in the remainder of appendix A. In this context, b 0 and b 1 , the coefficients of the regression straight line, are the estimations of the parameters of the model γ 0 and γ 1 , respectively, obtained from data, that is, b royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211103 13

A.2. The coefficient estimates
Consider the estimations b 0 and b 1 of the coefficients of the linear regression model (respectively, γ 0 and γ 1 in equation (A 2)) given by (A 1). If in (A1) we substitute the observations y i by the random variables from which they are assumed to be realizations, Y i , we obtain the expressions in (A 4) of the estimators of the coefficients, say B 0 and B 1 , which are random variables from which the estimations b 0 and b 1 , respectively, are realizations.
The Gauss-Markov theorem 1 says that if the hypothesis of the linear regression model, LR hypotheses, are satisfied, the estimators B 0 and B 1 are unbiased, that is, their distributions are centred at the corresponding coefficients (E denotes expectation of a random variable, that is, its mean value), and they are the tightest possible in the sense that they have the smallest variance among all possible estimators of the coefficients that are linear functions of the variables Y 1 , …, Y n . Then, they are the best linear unbiased estimators (BLUE) of the coefficients of the linear regression model. With regard to the other parameter of the model, σ 2 , its estimation is given by (A 3), which is the realization of the estimator c s 2 , a random variable independent of B 0 and B 1 defined by A.3. The analysis of the variance (ANOVA) for the linear regression model The principles and methodology of ANOVA (ANalysis Of the VAriance) can be applied to study if there is a linear relationship between two variables X and Y. Specifically, we will carry on a statistical test for the hypotheses (H 0 is the null statistical hypothesis that corresponds to 'no linear relationship between the variables', while the alternative H 1 is the opposite). Considering that quantities x 1 , …, x n are fixed, the total variability of the observations is measured by the 'total sum of squares' SST ¼ P n i¼1 ðy i À yÞ 2 , which can be decomposed as where SST has n − 1 associated degrees of freedom (over the n quantities y i À y, there is only one linear restriction: P n i¼1 ðy i À yÞ ¼ 0), SSE has n − 2 degrees of freedom since we sum the squares of n terms with two independent linear restrictions: P n i¼1 e i ¼ 0 and P n i¼1 e i ðx i À xÞ ¼ 0, and finally b 2 1 S xx has 1 degree of freedom since it is fixed.
The statistical test of hypotheses consists in rejecting H 0 if f ¼ b 2 1 S xx =MSE, with MSE = SSE/(n − 2), is 'big enough', that means greater than a tabulated value. As it can be seen (we do not give the details here) that f is the realization of a random variable F with distribution Fisher's F with 1 and n − 2 degrees of freedom if the null hypothesis H 0 is true, that is, As explained in [16], the method of OLS was developed by Gauss in Theoria combinationis observationum erroribus minimis obnoxiae (1823), where a first proof of an early version of the theorem is given. Markov rediscovered the same result and included it in his book Wahrscheinlichkeitsrechnung (1912), the year in which Fisher converts least squares into a general estimation method in statistics. The terminology Gauss-Markov theorem comes from Neyman. For historical details, see [17].
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211103 the hypothesis null H 0 is rejected with a significance level α (then, a linear relationship between the variables is accepted) if p-value ¼ PðF 1,nÀ2 . fÞ , a: Calculations necessary to obtain f are usually carried out with help of the ANOVA

A.4. Predicting with the linear regression model
Given a value for the variable X, let us say x 0 , the straight line equation is used to predict the corresponding for the variable Y, which is denoted byŷj x0 , in the following way: and it can be carried out as long as the value x 0 is found within the range of values given by x 1 , …, x n , and if the linear approximation is good (R 2 big enough).

A.5. Confidence intervals for the coefficients
Fixed γ ∈ (0, 1) as confidence level, the confidence intervals for the coefficients of the regression straight line are ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ÀP n i¼1 e 2 i =ðn À 2Þ Á S xx s and g 0 : b 0 + t nÀ2 1Àða=2Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ÀP n i¼1 e 2 i =ðn À 2Þ where α = 1 − γ and t nÀ2 1Àða=2Þ is the critical value for the distribution Student's t with n − 2 degrees of freedom, t n−2 , such that the probability that this distribution gives a value greater than the critical value is α/2 (that is, given ω ∈ (0, 1), t nÀ2 v denotes the real number such that Pðt nÀ2 , t nÀ2 v Þ ¼ v).

A.6. Confidence interval for the prediction
Fixed a value for the variable X, say x 0 , and γ ∈ (0, 1) as confidence level, the confidence interval for the prediction for the variable Y,Ŷj x0 ¼ g 0 þ g 1 x 0 (which can be thought as a new parameter, function of γ 0 and γ 1 , whose estimation isŷj x0 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P n i¼1 e 2 i n À 2 The value of x 0 that minimizes the length of the confidence interval for the prediction is x 0 ¼ x. As x 0 moves away from x (by excess or by default) the length increases symmetrically.

A.8. Prediction interval
Fixed a value for the variable X, say x 0 , and γ ∈ (0, 1) as confidence level, the prediction interval is an interval 'of the most probable values' for the variable Y, that when X = x 0 we denote by Y 0 , that is, Y 0 = γ 0 + γ 1 x 0 + δ 0 with δ 0 ∼ N(0, σ 2 ) independent of δ 1 , …, δ n . Informally speaking, the prediction interval is an interval where the variable Y 0 takes values with probability γ, and has the expression

A.9. Prediction interval for classical calibration
The problem with classical calibration is that to make predictions we have to deal with the reciprocal of the slope, which follows a Gaussian distribution under the hypothesis of the linear regression model. The reciprocal of a Gaussian random variable has infinite variance (then, the mean squared error is infinite), but although an asymptotic approximation can be derived using the Delta method (see [14]), it has limitations. By formulae (4.32) and (4.32a) in [18, p. 169], for any absorbance A i , the corresponding prediction interval using the classical calibration and under the hypothesis of the linear regression model, withe e i being the errors committed with the classical calibration, not to predict concentration from absorbance but to predict absorbance by concentration, that is, e e i ¼ A i À ðb 0 þ b 1 c i Þ. See also formula (5) in [14].
Appendix B. Two more examples B.
1. An example of practical calibration with real experimental data The following example of practical calibration is borrowed from [19] and can be used to compare the approaches of classical calibration and inverse regression. The data (table 3  For the difference between the absolute value of the errors in prediction with the classical calibration and the inverse regression (the former minus the latter) (table 5), we perform a one-sided Wilcoxon signed-rank test, which is the non-parametric counterpart of the t-test, to compare its median with 0 (the p-value of the Shapiro-Wilk test for normality is 1.976 × 10 −8ÃÃÃ , meaning that we have enough evidence to reject the normality of the sample). The p-value of the one-sided Wilcoxon test with the alternative hypothesis: 'the median of the difference is greater than 0' is 0.0002126 ÃÃÃ ; that shows a clear statistical significance in favour of inverse regression. Table 5. Predictions with the two methods: classical calibration and inverse regression, and corresponding radius of the prediction intervals and errors, for data in table 4. In italics the maximum R 2 and the minimum standard error s.e.
predictions b c i prediction interval radius errors With respect to the prediction interval radius, for all the (n = 70) observations, the radius for the inverse regression is less than that of the classical calibration approach. We can perform a statistical test to check if the median of the difference of the prediction interval radius (classical calibration minus inverse regression) is significantly greater than 0. As the p-value for the Shapiro-Wilk of normality is 8.925 × 10 −14ÃÃÃ , we reject normality and make the one-sided Wilcoxon signed-rank test, obtaining as p-value 1.793 × 10 −13ÃÃÃ , that expresses a very high statistical significance in favour of the inverse regression approach.
The analysis of variance (ANOVA)

B.2. An example by simulation
Apart from the toy example in §5, and the practical calibration example with real experimental data in the first subsection of this appendix, now we will perform a simulation experiment consisting in the following. First, a dataset with some values of concentration and the corresponding absorbances have been created by simulation, in this way: with ɛ i ∼ N(μ = 0, σ 2 = 10), all generated independently. We use the function rnom of R, and fix a random seed for reproducibility purpose with set.seed(123). (iii) As it is possible that some values of the absorbance are negative, delete such observations. This will depend on the Gaussian values that have been randomly generated. In our case, we are left with a final number of n = 447.   Table 8. Average mean sum of squared errors for both approximations, classical calibration and inverse regression, with k-fold cross-validation, k = 10, p-value for the one-sided t-test to compare the differences in the mean/median, and p-value of the exact binomial test in favour of the inverse regression (except those marked with y, which are in favour of the classical calibration), with the number of folds, out of the 10 there are, for which the mean sum of squared errors is greater for the inverse regression than for the classical calibration in brackets. All the p-values are significant except those for σ 2 < 2.
average mean sum of squared errors p-value p-value Second, we use k-fold cross-validation with k = 10 to evaluate the prediction error with the two approaches, classical calibration and inverse regression. Indeed, we randomly order the n instances (using the sample function of R), and then divide the observations into 10-folds, the first 9 composed of bn=10c observations (in this case, 44), and the last with the rest (51 observations). Then, for each fold: (a) We reserve the fold for validation and learn the linear regression models (to follow the two approaches) with the rest of the folds as a learning (training) set. (b) Once learned the two linear regression models, we follow the two approaches to predict, for each of the instances in the validation set, the concentration value from the known absorbance. (c) As we know the observed concentration value corresponding to the absorbance of any observation in the validation set, we can compare the observed and the predicted values obtained with the two approaches. (d) For any fold and approach, we compute the sum of the squared errors in making predictions and also divide by the number of instances minus 2, to compensate the fact that one of the folds has more observations than the other, obtaining in this way the mean sum of squared errors. Be careful: we are making predictions for the concentrations given the absorbances of new observations not seen by the regression models, which are the observations of the validation dataset. This is different from the usual situation in which we evaluate the predictive capacity of the model making predictions for the same observations that have been used to construct the model. (e) Finally, we have two paired samples of size k = 10 of values of the mean sum of squared errors, that can be used to perform a statistical test to compare the two approaches from the point of view of their predictive power.
In table 7, we have recorded the mean sum of squared errors for each fold.
For the difference between the mean sum of squared errors with the classical calibration and the inverse regression (the former minus the latter), we can perform a one-sided t-test to compare its mean with 0 (since the Shapiro-Wilk test for normality gives a p-value of 0.5422, which implies that we do not have enough evidence to reject the normality of the sample). The p-value of the one-sided t-test with the alternative hypothesis: 'the mean of the difference is greater than 0' is 0.0009752 ÃÃÃ , giving a very high statistical significance in favour of inverse regression being better than classical calibration (less mean sum of squared errors when predicting new cases). If instead, we had used the non-parametric Wilcoxon signed-rank test, not assuming normality of the sample of the differences, the one-sided p-value continues to be very small, 0.001953 ÃÃ , showing statistical significance in the same sense.
Finally, it is also possible to compute the p-value of the exact binomial test in favour of the inverse regression, taking into account that out of 10 cases, there are nine in which the mean sum of squared errors is greater for the classical calibration, and only one in which it is less, p-value ¼ PðBð10, p ¼ 0:5Þ ¼ 1Þ ¼ 10 1 0:5 10 ¼ 10 Â 0:5 10 ¼ 0:009765625 ÃÃ (showing significance at 1% level). As a conclusion, we can see that even in this example, in which the absorbance values have been simulated from those of the concentration to be able to reasonably assume the LR hypotheses with the classical calibration approach, favouring this approach, from the perspective of predictive power it is better to use the approximation of the inverse regression instead, in concordance with the conclusions in [13].
To evaluate the possible effect of the variance σ 2 , that we have chosen to be 10 to simulate the absorbance values up to now, we repeat the procedure with other possible values ranging from 0.01 to 30. In table 8, we record for any σ 2 , the values that had been computed before for the case σ 2 = 10: the average of the mean sum of squared errors (both, for the classical calibration and the inverse regression), the p-value of the one-sided t-test (or Wilcoxon signed-rank test, as appropriate) to compare the differences (mean/median of the classical calibration greater than that of inverse regression), and the p-value of the exact binomial test in favour of the inverse regression. We can observe clear evidence in favour of the inverse regression approach if σ 2 is big (σ 2 ≥ 2), and no differences when σ 2 is small, which agrees with intuition.