Philosophical Transactions of the Royal Society B: Biological Sciences
You have accessOpinion piece

Applicability of drug response metrics for cancer studies using biomaterials

Elizabeth A. Brooks

Elizabeth A. Brooks

Department of Chemical Engineering, University of Massachusetts Amherst, 240 Thatcher Road, Amherst, MA 01003-9364, USA

Google Scholar

Find this author on PubMed

,
Sualyneth Galarza

Sualyneth Galarza

Department of Chemical Engineering, University of Massachusetts Amherst, 240 Thatcher Road, Amherst, MA 01003-9364, USA

Google Scholar

Find this author on PubMed

,
Maria F. Gencoglu

Maria F. Gencoglu

Department of Chemical Engineering, University of Massachusetts Amherst, 240 Thatcher Road, Amherst, MA 01003-9364, USA

Google Scholar

Find this author on PubMed

,
R. Chase Cornelison

R. Chase Cornelison

Department of Biomedical Engineering and Mechanics, Virginia Tech, 325 Stanger Street, Blacksburg, VA 24061, USA

Google Scholar

Find this author on PubMed

,
Jennifer M. Munson

Jennifer M. Munson

Department of Biomedical Engineering and Mechanics, Virginia Tech, 325 Stanger Street, Blacksburg, VA 24061, USA

[email protected]

Google Scholar

Find this author on PubMed

and
Shelly R. Peyton

Shelly R. Peyton

Department of Chemical Engineering, University of Massachusetts Amherst, 240 Thatcher Road, Amherst, MA 01003-9364, USA

[email protected]

Google Scholar

Find this author on PubMed

    Abstract

    Bioengineers have built models of the tumour microenvironment (TME) in which to study cell–cell interactions, mechanisms of cancer growth and metastasis, and to test new therapies. These models allow researchers to culture cells in conditions that include features of the in vivo TME implicated in regulating cancer progression, such as extracellular matrix (ECM) stiffness, integrin binding to the ECM, immune and stromal cells, growth factor and cytokine depots, and a three-dimensional geometry more representative of the in vivo TME than tissue culture polystyrene (TCPS). These biomaterials could be particularly useful for drug screening applications to make better predictions of efficacy, offering better translation to preclinical models and clinical trials. However, it can be challenging to compare drug response reports across different biomaterial platforms in the current literature. This is, in part, a result of inconsistent reporting and improper use of drug response metrics, and vast differences in cell growth rates across a large variety of biomaterial designs. This study attempts to clarify the definitions of drug response measurements used in the field, and presents examples in which these measurements can and cannot be applied. We suggest as best practice to measure the growth rate of cells in the absence of drug, and follow our ‘decision tree’ when reporting drug response metrics.

    This article is part of a discussion meeting issue ‘Forces in cancer: interdisciplinary approaches in tumour mechanobiology’.

    1. Introduction

    Pharmacology metrics, such as IC50 (the inhibition concentration of a drug where the response is reduced by half), EC50 (the effective concentration of a drug that gives half-maximal response) and Emax (the drug's maximum effect), have been used to evaluate the results of drug response assays and describe drug potency. Recently, Hafner et al. [1] defined the GR50: the concentration of a drug that reduces cell growth rate by half. The GR50 is an important contribution to the field of drug screening, because it accounts for the variable differences in growth rates between different cell lines. However, drug response metrics can be misrepresented or applied incorrectly in certain instances, which has led to inconsistent results between studies. One high-profile study, Haibe-Kains et al. [2], reported inconsistencies between two large pharmacogenomic studies: the Cancer Genome Project (CGP) [3] and the Cancer Cell Line Encyclopedia (CCLE) [4]. They compared the IC50 and the area under the dose–response curve (AUC) for 15 drugs across 471 cell lines, and found very little correlation between the two studies (Spearman's rank correlation of 0.28 and 0.35 for IC50 and AUC, respectively) [2]. Discrepancies between these studies and others could be attributed to differences between experimental protocols (e.g. type and length of assay, cell culture substrate and medium used), method of dose–response analysis or because different laboratories use and apply these pharmacological metrics to their results differently.

    This type of inconsistency has extended to bioengineering, where new biomaterial platforms have been developed to incorporate features of the tumour microenvironment (TME), e.g. geometries, coculture systems and tunable extracellular matrix (ECM) stiffness. Bioengineers have postulated that these ECM cues from the TME could radically impact drug responses, which could be important for predicting the efficacy of a drug before embarking on preclinical studies. During a search of the literature, we observed that bioengineers have quantified drug responses using many different drug response metrics; however, it is not clear in every study why certain reporting tools were used, and whether or not they were applied correctly. As examples, we found cases where an IC50 was reported, but the drug was not effective enough to inhibit growth of half the cell population. Particularly a consideration when using different types of biomaterials, cell growth rate differences between two-dimensional (2D) and three-dimensional (3D) systems raises the question of whether the same metric tool should be used for both.

    When analysing the literature, we wondered whether the implementation of a global, consistent analysis could reduce the disagreement of values reported. Could methods used to analyse drug responses in 2D culture also apply for 3D systems? In the case of coculture systems, should a different approach be used to separate the responses from cancer cells and healthy cells in the TME? With this in mind, this perspective paper compares select cases in the literature, our own data for cell responses to drugs in and on biomaterials, metrics reported and inconsistencies between studies. We end with a recommendation for the incorporation of additional drug response metrics when working with biomaterial systems.

    2. Definitions of drug response metrics

    Drug response assays evaluate the impact of a drug on a population of cells over a range of concentrations. For simplicity, we will define the number of cells at the start of the assay readout, or the ‘initial value’ of the cells, as y0 (figure 1a). Unfortunately, many studies do not report y0, which prevents some metrics from being reported, as we discuss later in this section. The cells are then incubated with drug for a defined period of time (typically 24–72 h), and cell viability is measured (yfinal). Cells are also incubated with a small amount of the vehicle in which the drug was dissolved (often DMSO or water), serving as a control (yctrl). A drug is considered cytostatic if it slows or completely prevents growth of cells [5]. In other words, if the measured cell viability is between y0 and yctrl, that drug is said to be cytostatic at that concentration. Cytotoxic means that the drug reduces the cell number below the initial cell count (yfinal < y0). Note that when y0 is not measured, a true drug cytotoxicity cannot be reported.

    Figure 1.

    Figure 1. Definitions and examples of drug response metrics. The IC50 represents the drug concentration where the response is reduced by half. The EC50 represents the concentration of a drug that gives half-maximal response. The GI50 represents the concentration of a drug that reduces total cell growth by 50%. The GR50 represents the concentration of a drug that reduces cell growth rate by 50%. The Emax represents the fraction of viable cells at the highest drug concentration (maximal response), and the AUC represents the area under the dose–response curve. The y-axis shows the cell count (top plots) and normalized growth rate (bottom plots). Drugs are considered ‘cytotoxic’ if viability is reduced below the initial value (y0), and ‘cytostatic’ if viability is above the initial value, but below the control value (yctrl). Right curves (less potent) show a drug which reduces viability by 50% at maximum dose (IC50 is the maximum dose). Middle curves (moderately potent) show a drug which completely inhibits growth, but is not cytotoxic (Emax = initial viability). Left curves (potent) show a drug which is 100% cytotoxic (Emax = 0). Note that in these special cases, some of the other metrics are also equal to each other, which are labelled on the plot. (Online version in colour.)

    There are six typical metrics used to report the effect of a drug on a cell culture: IC50, EC50, GI50, GR50, Emax and AUC (figure 1a). Figure 1 gives definitions of these metrics, with three hypothetical drug response curves with varying degrees of ‘potency’. A ‘potent’ drug is 100% cytotoxic, a ‘moderately potent’ drug achieves 100% growth inhibition but no net cell death and a ‘less potent’ drug reduces cell growth by 50% (figure 1b–g). Drug potency can also be evaluated by the curve response class classifier (CRC), as described by Inglese et al. [6]. The CRC metric helps group the efficacy of cell and drug combinations to reveal if a particular combination is fully cytotoxic or cytostatic, and can be valuable in cases where a full dose–response curve is not obtained. Furthermore, these classifications aid in determining promising cases to select for future screening studies.

    The IC50 and Emax metrics do not consider the initial population (y0), nor the number of cell divisions during the length of the assay, which was a motivating factor for Hafner et al. [1] to define the GR50. Only the GI50 and GR50 take y0 into consideration. GI50 is the dose that inhibits the growth of cells by 50%, and GR50 represents inhibition of the growth rate, not total growth, of the cell culture. The initial cell population, y0, can vary between type of assay, cell type or length of assay. To account for this variation, GR50 is represented as data normalized with respect to the initial values (figure 1e–g).

    Although the ‘50’ in IC50, EC50, GI50 and GR50 metrics signifies a 50% inhibition, they can be used with values other than 50 to indicate different effects, e.g. IC90 [7,8]. Negative values can be used for the cytotoxic regime (yfinal < y0), although these do not come from the formal definitions of GI or GR. In this case, GI-10 would be the concentration where the cells are reduced 10% from the initial value (yfinal = 0.9 × y0), and GI-100 would be the concentration which kills all the cells. In the figure 1 example, 0–100 is defined over the range of 20 k ≤ y ≤ 100 k, while −100 to 0 is defined over the range of 0 ≤ y ≤ 20 k. IC-n or EC-n values are not possible since these metrics do not consider initial values.

    The Emax represents the maximum and the AUC metric represents the cumulative effect of the drug. Emax is the fraction of viable cells at the highest drug concentration tested in the experiment, and AUC is the area under the viability curve for a cell population over the tested drug concentration range. Although neither of these metrics make any explicit assumption about growth kinetics, they still depend on the concentration range, experiment duration and cell growth rate, which means that their reported values cannot be compared with other studies in most cases. Fallahi-Sichani et al. [9] found AUC to be a robust response metric when the goal was to compare a single drug across identical cell lines. However, these need to be exposed to identical dose ranges, and preferentially at an intermediate concentration. Emax can be used with multiple drugs and concentration ranges but is more informative at high doses [9]. Particularly, in the case of Emax, Fallahi-Sichani et al.'s work highlighted it as a parameter that yielded high variation independent of cell proliferation rate. Yet, this study was unable to conclude what drug metric parameter best describes a drug response without considering drug concentration.

    The IC50 is the most commonly reported drug response metric [10] and, therefore, it is important to highlight cases in which it is used with an incorrect definition. For instance, the IC50 should not be considered a measure of cell death [5]. As one example, in a case when the control value is more than 200% of the initial value (yctrl > 2 × y0, as can be seen in the examples given in figure 1b–d), the IC50 will result in a ‘cytostatic’ dose, but the cells are still growing, be it at a reduced rate. Second, in a case where a reduction in half the population is not reached (such as in [1113]), the IC50 cannot be appropriately calculated, and instead the EC50 is the more appropriate metric to report. In other instances in the literature, the EC50 and GI50 are confused with the IC50 [1]. However, the GI50 metric is a correction of the IC50, since it takes into consideration the initial cell count (y0) [14,15].

    Through our analysis, we also found examples where authors report a GI50, when it is actually an IC50 (they did not measure y0) [16]. For example, the cell population could grow over the course of an experiment, while the measured population values could still be lower than the control. Therefore, the initial cell populations must be measured to know whether a drug is killing cells or only slowing their growth. In addition, the IC50 is sometimes discussed in the context of growth inhibition [17], although it is not capable of measuring this. We thus recommend the field report the metric that is most appropriate for their observed responses and experimental conditions, given the explanations stated above. We also recommend that researchers measure the initial cell population values (y0), which will enable them to calculate GI50 and GR50 (if cell growth of the control is achieved over the course of the assay), particularly important where multiple cell lines or growth conditions are being tested as these metrics will account for differences in growth rates. This is also the only way to know if a drug is truly cytotoxic, as we mentioned earlier.

    The GR50 is very similar to GI50, but is defined by reduction in the growth rate, not cell growth as the GI50. Growth rate inhibition is calculated from initial and control values, and the fitting for the GR50 relies on the assumption that the cells are in exponential growth before application of the drug. GR50 is thus reported to be more robust than GI50 against variations in experimental protocols and conditions [1].

    3. Applying drug response metrics to data obtained from biomaterial drug screening assays

    Drug screening with cells on biomaterials rather than on tissue culture polystyrene (TCPS) is increasingly popular due to the ability to capture more physiologically relevant features in biomaterials that may impact drug response. Two-dimensional [1820] and 3D [11,21,22] biomaterial platforms have been developed to study cell behaviour in vitro. Since it is widely accepted that cells grow at different rates in 2D and 3D biomaterial platforms [23], it is difficult to compare drug responses across these different environments without a GI50 or GR50. Experiments to obtain these metrics require only minor adjustments to traditional drug screening protocols performed by seeding cells in an additional plate to measure initial population values (y0) (figure 2a). In particular, the GR50 has worked very well for over 4000 combinations of breast cancer cell lines and drugs on TCPS [24], but has only been used in 3D biomaterials in limited reports [25].

    Figure 2.

    Figure 2. (a) Schematic of typical experimental work flow for a drug response assay. Cells are seeded on a 2D tissue culture plastic surface, on a 2D biomaterial or within a 3D biomaterial for drug dosing. Wells in a second plate are seeded with the same conditions as the drug dosing plate to measure GI50 or GR50 values. After 24 h, drugs are added to the drug dosing plate and the second plate for initial values is assayed simultaneously for initial cell counts. The drug dosed plate is incubated for a period of time (e.g. 48 h) and then assayed for the final cell response. The collected data is used to calculate drug response metrics. (b) Cells grown on tissue culture plastic achieve sufficient growth to generate a traditional dose-response curve, as well as a GR values curve to calculate a GR50. (c) An example of patient cells grown on tissue culture plastic that do not grow exponentially over the course of the dosing assay. This results in a curve for traditional drug response metrics, but a GR curve cannot be calculated. (d) This is a case where cells grow over the course of the assay, but sufficient growth for calculating a GR50 measurement is not achieved because the resulting GR values are less than 0.5, which is the point where the GR50 is calculated. (e) Cell line MCTS encapsulated in a degradable 3D hydrogel demonstrates enough growth to calculate a GR50 and other drug response metrics. (f) Curve response classification descriptions for the data shown here. The cases presented here do not exactly correspond with the criteria described by Inglese et al. (particularly r2 values), but we categorized them to their nearest classification.

    Our own laboratory uses both 2D and 3D biomaterials in addition to TCPS for drug screening studies [26,27], and we have adapted our experimental procedures to collect data for calculating GR metrics in addition to the other metrics depicted in figure 1. However, we have found that the GR50 cannot always be applied. As it has been previously reported [1,24], it is necessary for the cells to achieve exponential growth over the course of the assay to use the GR50. On TCPS, this is not an issue, as demonstrated by our data in figure 2b with SKOV-3 ovarian cancer cells dosed with paclitaxel. In this case, the GR values span a −1 to 1 range, which results in a good curve fit to calculate a GR50.

    Another important factor to consider in preclinical drug screening assays is the increasing use of patient-derived primary cells. This has become a hurdle as many primary cells grow more slowly, or in some cases not at all, making it impossible to calculate a GR50 value. In these cases, the calculated GR values remain well below 0.5, even at a control dose. Therefore, the formula for calculating GR values can be applied, but with low growth rates the GR50 specifically does not apply (though the GR curve could still provide useful information). As illustrated by our own data of patient-derived cells—ovarian cancer cells from ascites dosed with cisplatin on TCPS (figure 2c)—IC50 and EC50 values could be calculated, but the growth rate was too slow for a GR50 value, and one could not be calculated. This serves as an example in which additional drug response metric parameters are necessary to understand the effect of drug dosing, given that these primary cells proliferated very slowly when grown in a 3D environment.

    For example, work by Longati et al. [11] highlights how pluripotent stem cell (PSC) drug response differs on 2D versus in 3D biomaterials. Although IC50 values were not reported in this work, we calculated the IC50 and EC50 from their published data and observed higher resistance in their PSC cells in 3D compared to 2D (electronic supplementary material, table S1). Ivanov et al. [22] performed drug response studies with neural stem cells (NSCs) and the UW228-3 glioblastoma cell line in 3D. They found that the NSC drug response was biphasic, but not for the human glioblastoma cell line (which showed more resistance in 3D). Here, two IC50 values were reported for the same curve in the case of primary cells, representing a situation where an IC50 (or GI/GR/EC50) is inappropriate. We would suggest an AUC or Emax instead, which are not dependent on curve fitting (figure 1b–d).

    As demonstrated by our own experimental data (figure 2d), culture of 3D patient-derived ovarian carcinoma ascite spheroids (OCAS) from ascites in a non-degradable 3D hydrogel exhibited a low growth rate over the course of the assay. Although a drug response curve with mafosfamide was generated from the data (figure 2d), this does not mean that a valid GR50 value can be obtained. GR value curves need to pass through GR = 0.5 or they cannot have a reliable GR50 value, even if certain curve fitting software gives a value for these circumstances, as we demonstrate in figure 2d. Therefore, we recommend that only the online GR calculator [1] be used to calculate GR metrics from raw data to ensure that true GR metrics are reported. There are cases of drug screening in 3D environments [25] where the GR metrics could be applied, but since growth is often slower in 3D than in 2D, the application of the GR calculations should be done carefully. In contrast with our dosing of ovarian cancer cells above, we demonstrate in figure 2e an example where we encapsulated SKOV-3 cells grown in multicellular tumour spheroids (MCTS) in a 3D hydrogel and dosed with mafosfamide. In this case, the cell growth was high enough to calculate a GR50. From our own work, we recommend reporting the GR50 when possible to best account for differences in growth rates between different cell sources. We also encourage others to provide all the raw data and drug response curves with their publications to allow others to compare published results with their own (electronic supplementary material, table S2).

    To further characterize the drug response in different material environments, we applied CRC metrics described by Inglese et al. [6] to our own data (figure 2f). We found that the r2 values that we obtained for the nonlinear curve fits to the data were less than 0.9, which meant it was not possible to fit our results into these exactly defined classes according to the criteria set by Inglese et al. However, some of these drugs had greater than 80% efficacy, and displayed drug response curve with one (partial), two (complete) or no asymptotes (incomplete). We found that the 3D models, OCAS in a 3D non-degradable hydrogel and SKOV-3 MCTS in a degradable 3D hydrogel, were in the same curve response class: ‘partial’ even though their GR metrics were very different. Additionally, patient cells on TCPS had a ‘complete’ response (class 1a), but it was not possible to calculate biologically meaningful GR metrics. Interestingly, at the range of concentrations that were tested, the case of SKOV-3 cells on TCPS was classified with an ‘incomplete’ response (class 2a), but all the traditional dose-response and GR metrics could be calculated. These additional metrics could be helpful for eliminating cases for further study in biomaterials when there are no responses (class 4), but we do not show any examples of that here. Characterization with CRC could be used as another method for grouping drug curve responses on biomaterials.

    4. Evaluation of drug responses in biomaterials reported in the literature

    To compare IC50 values across studies, we mined data from 25 reports that performed drug screening with biomaterial systems, and that provided raw data that could be extracted and analysed independently. We calculated the IC50, EC50, Emax and AUC values and organized them by drug in electronic supplementary material, table S1. We were not able to calculate GI50 and GR50 because the initial population (y0) and control (yctrl) values were not provided in these studies. Table 1 shows a summary of the reported IC50 values from the literature and the drug response metrics calculated by us from their reported data. Table 2 summarizes cases where the drug response curve did not reach 50% inhibition, meaning the IC50 was not reached. Table 3 summarizes cases where the IC50 values reported in the literature did not agree with the values we calculated from reported data. We also applied the CRC metrics described by Inglese et al. [6] to the data we extracted from the literature (tables 1 and 3).

    Table 1. Variation of IC50 reported for cell line-drug responses in different publications. Note: partial response, 1 asymptote; incomplete, 0 asymptote; complete, 2 asymptotes.

    calculated
    drug cell line format reported IC50 IC50 range reported IC50 range calculated IC50 Emax (growth inhibition) AUC EC50 R2 curve group
    paclitaxel MCF7 2D 0.003 µM 0.003–8.3 µM 0.003–0.009 µM 0.003 µM 78% @ 11 µM 224.70 0.004 µM 0.97 partial [21]
    2D 3.2–8.3 µM 0.009 µM 71% @ 0.2 µM 156.40 0.006 µM 0.95 incomplete [18]
    2D 10.9 nM 55% @ 500 nM 385.2 18 nM 0.96 partial [28]
    3D GFR Matrigel 0.006 µM ND-0.006 µM 0.0028–0.005 µM 0.005 µM 34% @ 191 µM 452.20 0.008 µM 0.97 partial [21]
    3D PEG-heparin 2.81 nM 37% @ 500 nM 381.9 2.33 nM 0.91 complete [28]
    MDA-MB-231 2D 0.004 µM 0.004–7.5 µM 0.002–4.6 µM 0.002 µM 97% @ 50 µM 139.60 0.004 µM 0.98 partial [21]
    2D 17.97 nM 45% @ 500 nM 418.8 13.59 nM 0.86 partial [28]
    2D 7.5 µM 4.6 µM 97% @ 25 µM 2.375 5.26 µM 0.89 incomplete [29]
    3D GFR Matrigel 0.03 µM 0.03–50 µM 0.03–50.3 µM 0.03 µM 74% @ 199 µM 335.70 0.04 µM 1.00 complete [21]
    3D PEG-heparin 4.41 nM 37% @ 500 nM 411.9 7.31 nM 0.94 complete [28]
    3D fibroin silk 50 µM 50.3 µM 58% @ 60 µM 3.326 25 µM 0.98 incomplete [29]
    doxorubicin MCF7 2D 0.23 µM 0.23–0.84 µM 0.3–0.88 µM 0.3 µM 100% @ 112 µM 340.60 0.3 µM 0.98 complete [17]
    2D 0.84 µM 0.88 µM 99% @ 117 µM 249.6 0.66 µM 1.00 incomplete [30]
    3D GFR Matrigel could not be calculated ND–70 µM 14.2–60 µM 14.2 µM 100% @ 239 µM 457.60 8.8 µM 1.00 partial [17]
    3D Matrigel EC50 = 12 not reached 41% @ 4 µM 237.3 1.76 µM 1.00 incomplete [31]
    3D agarose 70 µM 32.79 µg ml−1 (60 µM) 79% @ 115 µM 294.8 19 µM 1.00 partial [30]
    tamoxifen MCF7 2D 7.74 µM 7.74–14 µM 7.11–7.443 µM 7.11 µM 80% @ 100 µM 81.47 4.9 µM 1.00 partial [32]
    2D 14.0 µM 7.443 µM 99% @ 807 µM 163.6 14.49 µM 0.97 complete [33]
    3D 20.62 µM 20.6–72.6 µM 21–71.35 µM 21 µM 58% @ 90 µM 126.4 14.4 µM 0.99 complete [32]
    3D pNG cryogel 45.6 µM 45.09 µM 95% @ 800 µM 224.9 38.5 µM 0.99 complete [33]
    3D polyNIPAM spheroids 72.6 µM 71.35 µM 88% @ 80 µM 235.9 39 µM 0.99 complete
    epirubicin MDA-MB-231 2D 0.05 µM 0.05–0.6 µM 0.04–0.49 µM 0.04 µM 100% @ 1.1 µM 204.30 0.03 µM 0.97 partial [21]
    2D 0.6 µM 0.49 µM 85% @ 10 µM 287.7 0.41 µM 1.00 partial [34]
    3D GFR Matrigel 0.58 µM 0.58–0.6 µM 0.3–0.5 µM 0.5 µM 100% @ 52 µM 282.20 0.3 µM 0.99 complete [21]
    3D spheroid 0.6 µM 0.317 µM 70% @ 10 µM 285.3 0.28 µM 1.00 partial [34]
    docetaxel MDA-MB-231 2D 0.2 µM 0.2–31.4 µM 0.065–35 µM 0.065 µM 73% @ 10 µM 247.4 0.05 µM 1.00 partial
    2D 31.4 µM 35 µM 93% @123 µM 297.9 25.12 µM 0.96 incomplete [35]
    3D 10 µM not reached 29% @ 0.1 µM 312.4 could not determine 0.88 incomplete [34]

    Table 2. Examples where IC50 was not reached (drug concentration did not kill half the cells).

    calculated
    drug cell line format reported IC50 Emax (growth inhibition) AUC EC50 R2 curve group
    doxorubicin MCF7 3D EC50 = 12 µM drug did not kill half of the cells (IC50 not reached) 41% @ 4 µM 237.3 1.76 µM 0.99 incomplete [31]
    methotrexate JIMT1 2D IC50 not reported 30% @ 20 µM 406.80 0.02 µM ND incomplete [12]
    helenine JIMT1 2D IC50 not reported 11% @ 20 µM 511.70 2.0 µM ND incomplete
    API-2 JIMT1 2D IC50 not reported 17% @ 20 µM 398.70 11.1 µM ND incomplete
    gemcitabine BXPC3 3D showed resistance, no IC50 6% @ 500 nM 255.2 230 nM NDND incompleteincomplete [11]
    gemcitabine Capan-1 3D showed resistance, no IC50 25% @ 500 nM 231.4 63 nM 1.00 partial
    docetaxel MDA-MB-231 3D IC50 = 10 µM 29% @ 0.1 µM 312.4 could not determine 0.88 incomplete [34]

    Table 3. IC50 reported in a publication differed from that calculated by our laboratory independently.

    calculated
    drug cell line format reported IC50 IC50 Emax (growth inhibition) AUC EC50 R2 curve group
    paclitaxel SPCA-1 3D 2.97 µM 7.9 µM 84% @ 8.5 µM 19.7 7.75 µM 0.99 incomplete [36]
    786-0 2D 38.0–51.3 µM 782 µM 62% @ 0.1 µM 212.20 0.04 µM 0.97 partial [18]
    SW620 2D 2.6–3.6 µM 0.006 µM 100% @ 0.2 µM 128.70 0.006 µM 0.95 partial
    HT29 2D 2.8–4.7 µM 0.003 µM 100% @ 0.2 µM 119.20 0.003 µM 0.96 partial
    HeLa 2D 2.3–7.4 µM 0.02 µM 98% @ 0.1 µM 372.00 0.008 µM 0.99 partial
    SY5Y 2D 1.7–2.0 µM 0.006 µM 94% @ 0.1 µM 377.10 0.005 µM 0.99 partial
    PC3 2D 12.9–13.0 µM 0.03 µM 100% @ 0.2 µM 159.90 0.02 µM 0.99 partial
    doxorubicin HEPG2 (cocultured with LX-2) 3D 50.4 µM 24.5 µM 69% @ 1957 µM 341.20 17.6 µM 0.99 partial [37]
    doxorubicin HEPG2 (cocultured with LX-2) 2D 33.1 µM 228.7 µM 99% @ 213 µM 421.90 27.4 µM 0.89 partial
    doxorubicin + 0.1 µM calcipotriol HEPG2 (cocultured with LX-2) 3D 6.9 µM 3.92 µM 76% @ 1660 µM 317.10 5.89 µM 0.92 partial
    doxorubicin + 0.1 µM calcipotriol HEPG2 (cocultured with LX-2) 2D 43.7 µM 624.8 µM 96% @ 225 µM 388.40 42.7 µM 0.96 partial
    epirubicin MCF7 3D 0.5 µM 9.1 µM 98% @ 196 µM 406.30 3.8 µM 0.96 partial [21]

    First, we found that for highly potent drug–cell line combinations, such as MCF7 with paclitaxel or MDA-MB-231 with epirubicin, IC50 values reported were in the same order of magnitude (table 1). By contrast, when a cell line was not particularly sensitive to the drug, like in the case of the MDA-MB-231 cell line to paclitaxel or docetaxel and MCF7 treatment with doxorubicin or tamoxifen, IC50 values reported from different studies varied much more strongly. This variance appears to be more dependent on the potency of the drug than the platform in which the cells were treated. When drug sensitivity was moderate or low, wide ranges in IC50 values tended to be even more drastic in the 3D models compared to 2D.

    Another finding in table 1 is that highly efficacious drugs had ‘complete’ response curves, while less efficacious drugs had ‘partial’ or ‘incomplete’ curves. This is unsurprising, though there are some interesting observations: (i) all the drugs that failed to reach IC50 values had ‘incomplete’ curves; (ii) ‘incomplete’ or ‘partial’ response curves were mostly obtained in 2D models, while drugs tested in 3D models tended to display a ‘complete’ response curve. Paclitaxel results with MDA-MB-231 are a good example of this [20,27,38]. These results also illustrate the inadequacy of r2. In table 1, we present five cases (out of 28 cases in table 1) where r2 > 0.9, while the drug response curve had no asymptotes. So, it is unclear if Inglese et al.'s criteria apply here.

    One of the major challenges we encountered during our literature search was that a limited number of studies published their drug response curves. Some publications did not report IC50 values for cases where the drug concentration did not kill half of the cell population (table 2). Unsurprisingly, these drug response curves were all ‘incomplete’, with one exception. In most cases, there were too few points to even calculate an r2 value from the nonlinear fit. In these cases, r2 was reported as ‘ND’ (non-determined).

    Table 3 illustrates cases in which the IC50 we independently calculated did not agree with the one reported. This was mostly the case for cell lines that were fairly drug insensitive, as evidenced by the ‘partial’ response curves of these drugs. Despite the ‘partial’ response curve, Emax, IC50 and other metrics could be calculated for these drugs. This finding is significant, because it shows a pitfall of assessing drugs based on metrics without accounting for the drug response curve. The drugs in table 3 would seem efficacious based on their response metrics, although the raw response curves show that these cells are insensitive to these drugs. Finally, similar to table 1, the drug response curves in table 3 have r2 values of greater than 0.9, but only one asymptote (partial). This again shows that r2 alone is not an adequate criterion for CRC classification.

    The drug response metric values reported in tables 13 vary by study, and may depend on the type/length of assay, biomaterial used and/or analysis conducted. The most commonly used cell viability assays in our search included MTT assay [38], AlamarBlue (i.e. resazurin), Live/Dead staining and CellTiter-Glo. These types of assays indirectly measure the cytostatic or cytotoxic effect of a drug, via metabolic activity, counting of dead cells, cell death or ATP activity. There are additional complications with data reported in the literature. Many publications did not present enough data points for us to calculate IC50, and could not be included in the comparison. There were other published reports where no metrics were reported, which makes it impossible to relate them to other published data. Clearly, better standard practices should be adopted. We recommend that future publications explicitly define the metrics they use, for two reasons. First, clearly explaining the metrics used in an article would help others learn about drug response metrics, and it would also prevent them from misinterpreting results. Second, definitions of the metrics are dependent on the context. For example, in the articles we summarized, ‘inhibition’ in IC50 refers to the inhibition of cell viability. In other works, however, it may refer to the inhibition of cell growth, which should be called a GI50 and calculated accordingly.

    Among the 30 studies we examined, 25 presented drug response curves from which IC50 values could be obtained. We used the WebPlotDigitizer Tool (https://automeris.io/WebPlotDigitizer) [39] to extract drug concentrations and cell viabilities from these curves. These data were analysed in GraphPad Prism to calculate an IC50 using nonlinear regression with variable slope (four parameter) and least-squares fit method. From the data summarized in tables 13, we made comparisons between the 51 drug response curves and IC50 values reported in these studies. We found our results in agreement with 35 of these (69%), including five cases where neither we nor the original authors could obtain an IC50 value due to drug potencies being too low. In 16 cases (31%), IC50 values were significantly different between the value reported and our own calculation. These differences are possible because (i) we extracted the numerical data from plots in article figures, which may introduce error; (ii) different researchers may have used different forms of nonlinear regression (e.g. least-squares or robust fit methods for curve fitting, fixing the hill slope to the standard −1 or using variable slope); (iii) other researchers may have chosen different methods (appropriate or not) to handle problems such as outliers and negative inhibition, including setting constraints on the maximum and minimum values, manually determining outliers, using software algorithms for automatic outlier detection, etc.; and (iv) there could be cases where the IC50 could not be calculated due to the shape of the fitted curve, yet some data analysis software will attempt to calculate an IC50 that results in an unrealistic value.

    5. Assessing drug response in coculture systems

    Coculturing cancer cells with stromal cells (e.g. cancer-associated fibroblasts, pericytes or adipocytes) has been shown to drastically alter drug response, ranging from promoting drug resistance to increasing drug sensitivity [25,4044]. Furthermore, multicellular cocultures may be more physiologically relevant than monocultures, either in 2D or 3D, as they can account for tissue-level interactions. In fact, it was recently shown that basal-like and mesenchymal-like subclasses of breast cancer could be distinguished based on their expected drug sensitivities, but only when cocultured with fibroblasts [41]. It is unclear how much complexity is required to accurately predict in vivo drug efficacy, but recreating aspects of the cellular TME is emerging as an important consideration due to the in vivo spatial heterogeneity. Logsdon et al. [40] found that MDA-MB-231 cells in mixed, 3D culture with fibroblasts were more resistant to 10 µM doxorubicin at low ratios of tumour to stromal cells (4 : 1) but equally affected by the drug at higher ratios (1 : 4). Shen et al. [45] found similar results using a micro-patterned interface of tumour to stromal cells wherein MCF7 cell proliferation was inhibited by reversine at the interface but not in the bulk. Expanding these datasets to evaluate a range of drug doses would provide insight into how dose response varies between the tumour bulk and regions of more diffuse invasion.

    One challenge with cocultures is the determination of drug response metrics for the discrete populations of cells. It is easiest to use cells expressing a reporter transgene or labelled with non-toxic dyes. Measurement of total fluorescence or bioluminescence then provides an estimate of the labelled cell number over the course of drug treatment [46,47]. However, dead cells may remain within 3D models, so the use of total fluorescence readings in these systems may be inaccurate. More appropriate in 3D is to stain cells using a viability marker like propidium iodide or JC-1 and quantify cell viability and/or number using either confocal/multiphoton microscopy or flow cytometry [40,41,44]. These methods can be used to track the drug response of one cell type while ignoring other cell types, or examine it for multiple cell types via multiplexing of different fluorophores. Coculture systems do require deciding which sample is more appropriate for calculating yctrl: a cancer cell only sample or a sample with all the cell types. Arguably, the respective untreated sample should be used for each treated sample to compensate for any effects of the stromal cells on cancer cell viability or growth rate. Additionally, the multiple centrifugation steps involved in harvesting and labelling cells for flow cytometry carry a risk of decreasing cell yield, such that it would be best to seed separate samples at the start of the study for determining an accurate y0.

    Physically separating stromal and tumour cells using either conditioned media or culture inserts can isolate effects, but several studies have shown that direct cell–cell contact may be a crucial component of stromal-derived effects on cancer cells [45,48,49]. In mixed cultures, the most common methods to assess cell viability, such as MTT assay, AlamarBlue and CellTiter-Glo, measure the entire population of cells such that isolating the effects on only the cancer cells is not feasible. For example, by using CellTiter-Glo, Ngo & Harley [25] reported an increase in the overall growth rate of glioblastoma–endothelial cell cocultures with increasing temozolomide concentration, but it is unknown if both cell types contributed equally. It is possible the cancer cells responded the same as in monoculture (growth rate inhibition), while endothelial cells increased their growth rate, or that only endothelial cells were affected and therefore protected the cancer cells. Separating the responses of different cell types may not always be a significant drawback, but the presence of stromal cells has the potential to confound the results if overall survival apparently increases with drug treatment, as was reported by Yang et al. [46], such that IC50 or EC50 would be impossible to calculate. Thus, it may be more appropriate in these multicellular cultures to calculate GR50 or GI50 to determine overall drug responses across conditions.

    6. Conclusion

    Drug screening in biomaterials could be particularly useful in making better predictions in the early stages of preclinical drug development. However, it can be challenging to compare drug responses across different platforms and conditions in the current literature. This is, in part, a result of inconsistent applications of drug response metrics, and differences in cell growth rates for cells cultured in different biomaterials. For this reason, we suggest the use of GI50 and GR50 to account for initial populations (y0) and number of cell divisions during an assay, since cell growth highly impacts dose response. However, in instances when steady cell growth is not achieved, multiple drug response metrics could be applied (e.g. IC50, EC50, Emax and AUC) to account for possible experimental variation. To aid researchers in determining what drug response metrics can be calculated from their data, we suggest the use of a decision tree (figure 3) based on the traditional drug response curve and cell growth rate data that are obtained for a drug response experiment. First, visual inspection of a drug response curve will determine if an IC50 can be calculated if less than 50% of the control cell population is remaining at the highest drug concentration that was tested. If 100% cell death has been achieved, then the EC50 and IC50 will be equal. Furthermore, if the cells grew exponentially over the course of the drug screening assay, then the GI50 and GR50 metrics can be applied. We also encourage other research groups to incorporate raw data and drug response curves in their reports that will allow other researchers to gather additional data for their analyses. In the long term, this will lead to more accurate predictions early in the drug development pipeline of how likely a drug will be successful in a clinical setting.

    Figure 3.

    Figure 3. Decision tree for determining what drug response metrics can be calculated from drug response data. It is easiest to first look at a typical dose-response curve and calculate data from it. Then, depending on cell growth over the course of the assay, additional metrics may be calculated. In the first step, the criterion is whether the normalized cell growth decreased below 0.5, which is required for IC50 calculation. Criterion for the second step is whether exponential growth was achieved during the experiment. (Online version in colour.)

    Data accessibility

    The datasets supporting this article have been uploaded as part of the electronic supplementary material.

    Competing interests

    We declare we have no competing interests.

    Funding

    S.R.P., E.A.B., S.G. and M.F.G. were supported by an NIH New Innovator award (1DP2CA186573-01) and an NSF CAREER award (DMR-1454806). E.A.B. was partially supported by a fellowship from the National Institutes of Health as part of the University of Massachusetts Chemistry-Biology Interface Training Program (National Research Service Award T32 GM008515). J.M.M. and R.C.C. were supported in part by NCI R37 CA222563.

    Acknowledgements

    We would like to thank Kelly Stevens and Daniel Corbett at the University of Washington for providing microwell plates that we used to form MCTS for the experiments in figure 2. We would like to thank Hong Bing (Amy) Chen, Chien-I (Mike) Chang, and Cristian Fraioli at the Cancer Center Tissue and Tumor Bank at UMass Medical School for primary ovarian cancer ascites used for the experiments in figure 2. We would also like to thank Dr Aaron Meyer at UCLA for helpful discussions.

    Footnotes

    One contribution of 13 to a discussion meeting issue ‘Forces in cancer: interdisciplinary approaches in tumour mechanobiology’.

    Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.4518941.

    Published by the Royal Society. All rights reserved.