Tensor clustering with algebraic constraints gives interpretable groups of crosstalk mechanisms in breast cancer

We introduce a tensor-based clustering method to extract sparse, low-dimensional structure from high-dimensional, multi-indexed datasets. This framework is designed to enable detection of clusters of data in the presence of structural requirements which we encode as algebraic constraints in a linear program. Our clustering method is general and can be tailored to a variety of applications in science and industry. We illustrate our method on a collection of experiments measuring the response of genetically diverse breast cancer cell lines to an array of ligands. Each experiment consists of a cell line–ligand combination, and contains time-course measurements of the early signalling kinases MAPK and AKT at two different ligand dose levels. By imposing appropriate structural constraints and respecting the multi-indexed structure of the data, the analysis of clusters can be optimized for biological interpretation and therapeutic understanding. We then perform a systematic, large-scale exploration of mechanistic models of MAPK–AKT crosstalk for each cluster. This analysis allows us to quantify the heterogeneity of breast cancer cell subtypes, and leads to hypotheses about the signalling mechanisms that mediate the response of the cell lines to ligands.


I. INTRODUCTION
Muti-dimensional datasets are now prevalent across the sciences, and their importance will only continue to grow [1][2][3][4].This affluence of data demands new methods that respect multi-dimensional structures and exploit them.We introduce a versatile new data clustering framework based on tensors (high dimensional arrays) and algebra to analyze multi-dimensional datasets.
One key feature of this method is that, given application specific constraints on the composition of a cluster, it is guaranteed to find the optimal partition.The flexibility of the method allows it to be used directly on a dataset (i.e., as a standalone clustering tool), or in combination with other clustering methods.
We showcase our method on a set of time-course measurements of the activation levels of the MAPK and PI3K pathways, by analyzing the phosphorylation of ERK and AKT respectively, in 36 breast cancer cell lines after exposure to 14 ligands (growth factors) [1].Because the dataset is complete (i.e., there is a measurement for every combination of times, proteins, cell lines, ligands, and doses) we can represent it as a tensor in five dimensions (Fig. 1A).We exploit this tensorial representation to explore the crosstalk between signaling pathways that are known to dysfunction in cancer following their exposure to specific growth factors [7][8][9][10]16].Although the key signaling proteins and subtype responses in breast cancer cells are known, the dynamics and mechanisms of signal transduction networks across genetically diverse cell lines are not well understood [1,11,12].These mechanisms are important to understand because they have been shown to be related to cellular decisions and fates [10][11][12][13].We then perform a systematic and exhaustive search of mathematical models for each cluster from a large pool of candidates.
Our analysis consists of two main steps.First, we use tensors and algebra to find clusters of experiments subject to structural or interpretability constraints (Fig. 1B,C).Second, we perform a systematic search for nonlinear ordinary differential equation (ODE) models that reproduce the key dynamical features of the time series in each cluster (Fig. 1D).
We find clusters of experiments with similar response to the stimuli using a novel notion of tensor similarity.We then use integer programming to find clusters that are globally optimal.These partitions can be visualized by color-coding the entries of the tensor according to their cluster assignment.We impose the structural constraint that each cluster be rectangle-shaped on the tensor (although possibly disconnected).This constraint ensures that that clusters match a subset of the cell lines (which have various mutations) with the subset of growth factors whose effect on the signaling pathway is altered by the mutation.If two experiments, belong to the same cluster, then the diagonally opposite entries in the tensor must also belong to the same cluster (see Fig. 1C).The constraints allow us to demand, for example, that measurements inside a cluster are consistent with the same biological mechanism.One of the strengths of our approach, is that it can work with pre-existing non-rectangular partitions of the data obtained from established methods (e.g., k-means, spectral methods, community detection on graphs).Starting from an initial partition is computationally advantageous and our method is able to find the nearest optimal rectangular clustering.
Once an optimal partition of the data into clusters is obtained, we look for mechanisms that can explain the behavior observed in each cluster.We explore differences between clusters to understand the variation of the signaling mechanisms across cancer cell lines.To this end, we construct, parametrize, and rank models for each cluster from a pool of 729 candidate models that best describe the data.

II. TENSORS AND ALGEBRA
Data tensor.We represent a multi-indexed dataset as a tensor Z of order h in the real numbers with size n 1 × . . .× n h (i.e., Z ∈ R n1×...×n h , where n i ∈ N, and i = 1, . . ., h).When the data is complete, every entry of the tensor is filled with a number.A full treatment of tensors is available in [1] and references therein.We introduce here the tensor theory required for our analysis.
Similarity tensors.In a similarity matrix the entry (i, j) records the pairwise similarity of the two items labelled by unidimensional indices i and j.We now introduce the high-dimensional generalization of a similarity matrix, which extends this to multi-indexed data.Suppose we want to compute the similarity of the data indexed by i = (i 1 , i 2 ) and that indexed by j = (j 1 , j 2 ): where

Algebraic interpretability condition.
When clustering a set of data points we typically seek to find a partition such that the points within a cluster are more similar (or close) to each other than to the rest of the data [17].In the simplest cases, there are few restrictions on the clusters other than the similarity or distance be reflected in the cluster assignments.In certain cases, imposing restrictions on the clusters can be desirable or even required.Here we pursue structured clustering; we impose restrictions on what the clusters can look like using a tensor.These restrictions help us to interpret clusters in terms of data-generating mechanisms (e.g., find the model that best describes the observations in each cluster).
A hard partition of a dataset represented as a tensor Z of size n 1 × • • • × n h into c clusters can be encoded in two ways: which the data has multi-indices i = (i 1 , . . ., i d ) and j = (j 1 , . . ., j d ), and: x ij = 0 if i and j belong to the same cluster, 1 otherwise. ( The tensor X can be seen as a boolean approximation of the distances between pairs of data points: x ij = 0 if i and j are 'close' (in the same cluster), and x ij = 1 if they are 'far' (in different clusters).
To ensure that X encodes a valid clustering of the data, the three conditions of an equivalence relation must be met.These conditions are given by the following algebraic equations and inequality: Reflexivity: x ii = 0, Symmetry: We require that m k=1 y ik = 1, to ensure that each data item has been assigned to exactly one cluster.
The tensors X and Y are related by the following equation Integer Optimization.The structural or interpretability conditions we have imposed on the clusters take the form of linear constraints.These constraints, along with the fact that the tensors are boolean, allow us to find optimal tensors X and Y by solving an integer linear program [18,19].Specifically we use the branch and cut algorithm [20] as we describe in the Structured clustering section below.

III. DATA
We examine an extensive experimental dataset detailing the temporal phosphorylation response of signaling molecules in genetically diverse breast cancer cell lines in response to different growth factors [1].This dataset is complete and can be represented by a tensor Z of order 5 whose dimensions correspond to 36 cell lines, 14 ligands, 2 doses, 3 time points, and 2 proteins (pERK, pAKT) (see the SI Appendix).In this work each experiment is a set of measurements for each cell line/ligand combination (i.e., 36 • 14 = 504 experiments).Our goal is to find sets of experiments with a similar response; consequently, the data structures we require are the following: Z ∈ R 504×12 , (flattened data tensor) S ∈ R 504×504 .

IV. STRUCTURED CLUSTERING
Given the similarity tensor S, we now compute the best partition into clusters subject to interpretability constraints.The constraint we impose requires the clusters to be rectangular-shaped with respect to cell lines and ligands (see eq. ( 5)).We describe the biological motivation in the results, and mathematical details of the method here.
We present two implementations, one that requires no previous knowledge about clustering assignments of experiments and provides an optimal clustering directly from the data.The other begins with a pre-existing partition of the experiments into clusters (not necessarily compliant with the constraints), which might originate from any clustering method (i.e., using the reshaped similarity tensor S).The implementation then reconstructs S and finds the nearest optimal clustering that complies with the constraints.Starting with an initial clustering has the advantage that it is less computationally expensive than starting with no prior knowledge of the clusters (see SI Appendix).
The clustering assignments are recorded by the tensor X defined in equation ( 2).The rectangular-shaped interpretability condition corresponds to three simultaneous algebraic constraints on the entries of X: We search over arrays X that satisfy these conditions, and find the one which best maximizes internal similarity, according to the following criterion.Experiments in the same cluster should have high similarity, so we maximize the similarity between experiments in the same cluster.This maximization is equivalent to solving the integer optimization problem where tensors X and S are as above, and •, • denotes the entry-wise inner product.The 504 2 ×1 vector vec(X) is the vectorized form of X, and 1 is the tensor of ones with the same size as X.The coefficient λ is a regularization term introduced to control the number of clusters.The matrix V encodes the constraints on X given in eq. ( 3) and eq.( 5).This matrix has over 1 million rows, 504 2 columns and is extremely sparse.The kth row of V represents the kth constraint to the values of vec(X): the entry is the coefficient (which can be 0, 1, or −1, depending on the constraint) with which each entry of vec(X) appears in the constraint.The kth entry of b l and b u (which can be 0, 1, or 2) give the lower and upper bounds respectively of each linear inequality.We solve this optimization program using the branch and cut algorithm [20] via the IBM ILOG CPLEX Optimization Studio [21].
The resulting rectangle-shaped clusters are a sparse, low-rank representation of the data.The tensor 1 − X of size (36 × 14) × (36 × 14) gives a binary measure of the distance between any two experiments.This tensor has sparse block structure: it consists of m cuboids of 1s along the diagonal, where m is the number of clusters, and has zeros everywhere else.As a consequence X has low multilinear rank [22], bounded above by (m, m, m, m), which is less than the maximum possible value of (36,14,36,14).

B. Pre-existing clusters
Our method can find interpretable, structured clusters from an initial unstructured partition of the data.As before, we find structured clusters using linear integer optimization.The input is an initial partition of the 504 experiments into m clusters that do not have to be rectangular.We then modify the cluster assignments to reach the closest possible interpretable, structured clustering.
Mathematically, this implementation is set up in the following way.The initial clustering is encoded by a partition tensor, W, of size 36 × 14 × m w ik = 1 i is in cluster k 0 otherwise where i = (i 1 , i 2 ) indexes an experiment.The new clusters are encoded by a tensor Y of the same size (defined according to equation ( 4)).In order to have rectangular clusters, the entries of Y must satisfy the conditions m r=1 yijr = 1, (unique cluster assignment) As before, we use the branch and cut algorithm to obtain the global optimum for the optimization problem The inner product W, Y sums the number of clustering assignments unchanged by converting the initial unstructured clustering into a clustering that satisfies the interpretability constraints.
We obtain the tensor Y, of size 36 × 14 × m by solving the optimization problem in eq. ( 7).As with X, the tensor Y also has sparse and low-rank structure.Its m two-dimensional slices, each a matrix of size 36×14, have rank two and block structure with a rectangle shape populated by 1s and all other values equal to 0.

V. RESULTS
Each experiment in our data is indexed by (c, l), where c is the cell line, and l is the ligand.A high similarity between experiments suggests the possibility of a common underlying biological mechanism.This is the basic notion that underpins the constraints in our clustering method.The biological hypothesis is that if experiments (c 1 , l 1 ) and (c 2 , l 2 ) are in the same cluster, it is possible that cell lines c 1 and c 2 share a feature such as a genetic mutation.This feature causes them to respond similarly to ligands l 1 and l 2 , and the genetic mutation is relevant to the part of the signaling pathways these ligands activate.Such biological causality would imply that if we swapped the ligands, experiments (c 2 , l 1 ) and (c 1 , l 2 ) should also respond in a similar way.The biological mechanism that gives rise to the similarity between (c 1 , l 1 ) and (c 2 , l 2 ) can be discounted if it is not also reflected in experiments (c 1 , l 2 ) and (c 2 , l 1 ); which is why we require that the clusters of experiments have a rectangular shape.

A. Interpretable groups by mutation and receptor subtype
In a clinical setting, prognosis and treatment decisions for breast cancer are guided by tumor grade, stage and clinical subtype [23], which is based on the presence of cellular receptors: • HER2 amp cells are characterized by amplification of the HER2 gene leading to over-expression of the ErbB2 receptor tyrosine kinase • HR + cells are characterized by the expression of the estrogen receptor (ER) or progesterone receptor (PR) • TNBC, or triple negative breast cancer, cells are negative for HER2 amplification and express ER and PR only at low levels We compare the clusters from our method with the three standard clinical subtypes above.We also compare our clusters with the mutational status of the cell lines [24,25], and with their drug response [26].
Clustering clinical subtypes.We first investigate a fine-grained classification within each of the three clinical subtypes.A summary statistic between 0 and 1 (based on the cosine similarity, see SI Appendix) quantifies the within-class variation for each clinical subtype.A score of 0 indicates complete homogeneity, and 1 indicates complete heterogeneity.The HER2 amp cell lines show comparatively little variation, with an average difference score of 0.086.The TNBC and the HR + cell lines have an average difference score of 0.224 and 0.334.We obtained clusters without prior knowledge of an initial clustering by solving the optimization problem (6).The results (shown in Fig. 2A and 2B) identify heterogeneity within each subtype as well as cell lines of particular interest.
The TNBC cell lines are divided into 12 clusters (Fig. 2A) which mirror the heterogeneous behavior of TNBC in the clinic [27].All but one TNBC cell lines with a PTEN mutation appear in the green cluster.The only exception is the HCC1937 cell line which has a PTEN mutation but appears in the yellow cluster.The cluster assignment of cell lines MDA-MB-231 and MDA-MB-157 is markedly different that the other cells across the ligands.These assignments might be explained by the mutational status of the cell lines; MDA-MB-231 is the only cell line with an NF2 mutation or a BRAF mutation, whereas MDA-MB-157 is the only cell line with an NF1 mutation.The bright orange cluster contains five cell lines (all but HCC1937) with the same two CDKH2A mutations.
Figure 2B shows the clustering of the HR + cell lines.Cell line MDA-MB-415 stands out for its response to so-called high-response ligands [1] (ligands to the left of HRG in Fig. 2B).Among all cell lines, MDA-MB-415 has the second highest susceptibility to the drugs Ixabepilone, Methylglyoxal and PD [26].CAMA-1 cell line, is distinctive in its response to the low-response ligands (to the right of HRG), which might help explain why it is particularly susceptible to both (Z)-4-Hydroxytamoxifen and TCS PIM-11 [26].
The HER2 amp cell lines cluster together for all ligands except for the MDA-MB-361 cell line.This is the HER2 amp cell line most resistant to HER2-targeted therapy such as Lapatinib [26].In fact, its resistance to Lapatinib exceeds that of some TNBC cell lines (HCC2185 and MDA-MB-453).The grouping of the rest indicates the consistency among all other HER2 amp cell lines (see SI Appendix).
Clustering all cell lines.We now apply our method to the full dataset.Here we solve the optimization problem (7), which means that we rely on an initial clustering to speed up computations (see the SI Appendix).The groups we find respect the broad division of the cell lines seen in Fig. 2A,B, which is a sign of the consistency between the two implementations of our method.
We obtained our initial clustering by first constructing a graph of experiments using the Relaxed Minimum Spanning Tree algorithm [2][3][4].Then we used the Markov Stability community detection method [5,6] to obtain the clusters.Markov Stability identifies robust partitions of the experiments into three, five and seven groups (see SI Appendix).
From the initial partition into three clusters, we obtain three rectangular clusters (Fig. 2C).Of these, we find that two groups of ligands correspond to previously reported high active expression profiles (yellow and green) and one to muted profiles (blue), respectively [1].Within the more highly active group, the HR + cell lines are predominantly in the yellow cluster, while the HER2 amp cells are in the green cluster.This separation of the HR + and HER2 amp clinical subtypes is entirely data driven and supports the notion that our method is indeed able to find interpretable groups.The cell lines that are not clustered according to their subtype reflect previous findings that neither growth factor responses nor sensitivity to drugs that target signal transduction pathways is uniform within clinical subtypes [1,12,18].The TNBC cell lines are divided between the yellow and green clusters, providing further evidence of the heterogeneity in TNBC cell lines [18,[34][35][36][37][38].
When we use initial clustering into five groups, the rectangular clusters split the ligands into a low response group (blues) and high response (green, yellow, brown).This split is nearly the same as we obtained before (Fig. 2D).Note that the difference in the ligand HGF may be due to the fact that it is not part of the ErbB nor the FGF families of ligands.The HER2 amp cell lines are now all assigned to the green cluster and there are only three HR + cell lines not assigned to the yellow cluster.A new brown cluster consists of cell lines: MDA-MB-175-VII (classified as a HR + ), UACC-812 (HER2 amp ), 1845B5 (TNBC) and HS578T (TNBC).While none of them have the same cell classification or genetic mutation, all cell lines in the brown cluster show high suscep- tibility to Gefitinib [26].Note that MDA-MB-175-VII is the only HR + cell line that is not assigned to the yellow group in either three or five clusters; this might be due to the fact that this cell line carries a unique chromosomal translocation.The translocation leads to the fusion and amplification of neuregulin-1 which signals through ErbB2/ErbB3 heterodimers [39,40], and could be the underlying cause of the cell line's unique sensitivity to ErbB-targeting drugs like Lapatinib or Afatinib [12,18].Finally, the clustering that begins with an initial partition into seven groups shows high consistency with the five cluster case (see SI Appendix).We therefore continue our analysis on the five clusters.

B. Systematic model identification
We now analyze the response of the five structured groups found in the previous section (Fig. 2D) to obtain a mechanistic insight about how the cell line/ligand combinations in them respond to the stimuli.We perform systematic model analysis of 729 possible network models, and find that with our data 44 are structurally identifiable, a prerequisite for performing parameter estimation and model selection.Then, we parametrize, rank and choose the models that best represent each cluster's response.As a result, we have a list of candidate signaling mechanisms for each cluster which provides more information than the statistical predictions of the sensitivity of MAP Kinase drug targets (e.g., ErbB drug class) [18].Models of the MAPK and AKT pathways have been studied under a variety of biological and modeling assumptions [8,9], including pathway crosstalk [13,[15][16][17].
Here we consider simple models to ensure the parameters are at least locally identifiable so there are a finite number of parameter values to fit the data.See the SI Appendix for a brief synopsis of MAPK models.
We construct nonlinear ordinary differential equation models to describe the dynamics of the AKT and ERK signaling pathways.These models include three molecular species: Receptor (R), pERK (E) and pAKT (A).Since the data contains the response of pERK and pAKT, we assume that the receptor must phosphorylate ERK and/or AKT.We consider positive, negative, or no interaction between pERK and pAKT under different types of kinetic regimes (mass-action or Michaelis-Menten) and different types of inhibition (blocking/sequestration or removal/degradation).The combination of these features result in the 44 structurally identifiable models that we study in further detail.Each model corresponds to a different mechanistic hypothesis of the dynamics in the Top 2 nd   Cluster number 1 2 3 4 5 Strength of kinetics: pathways (see SI Appendix).To find the models that best describe the response of each of the five clusters, we estimate parameters using the Squeeze-and-Breathe algorithm [21], and rank them using the Akaike information criterion score (AICc) (see SI Appendix).The best models for each cluster are shown in Fig. 3.
The AICc penalizes more complex models; therefore it is not surprising that the top models are the simplest ones.The best models for each cluster have different feedback strengths (parameter values) and network topologies (see Fig. 3); this supports the hypothesis that mutations may play a role in the dynamics.Although the values of the parameters vary, the model with arrows from the receptor to pAKT and pERK appears in all clusters, which is in line with how cells are understood to operate.We remark that cluster 4, which corresponds to HR + cells (yellow in Fig. 2D), includes inhibition crosstalk as the second best model, whereas in all other clusters this mechanism appears in fourth place.These results support the notion that clusters may have different parameter values/network structure and that our approach is viable.As more comprehensive datasets become available, for example including the receptor dynamics, and more molecular species, we will be able to incorporate more complex models to our approach.

VI. DISCUSSION
We presented a framework for analyzing multi-indexed data that allows structural constraints to be incorporated.This structural imposition also relates to machine learning techniques in the sense that it is an unsupervised, data-driven clustering method, since the groups into which the data is divided are not labeled at the start.However, the method allows the user to exercise more control than in many unsupervised methods due to the pre-defined constraints on what the final clusters can look like.
We tested this method on an extensive dataset charting the response of genetically diverse breast cancer cell lines to ligands.We identified both similarities an heterogeneities within clinical subtypes.The heterogeneity seems to be related to both the mutational status of the cells as well as their response to inhibitors.This means that similar analyses in patient tissues might identify patients that respond differently to therapeutics commonly used within a clinical subtype.By analyzing clusters from all subtypes, we also showed that we cannot explain the dynamics of each data cluster with only one mathematical model.
Our methodology is general and can be adapted to a variety of structural restrictions.For example, when there are size restrictions on the clusters, or to impose/prohibit particular combinations of data.Subject to such constraints, the method finds an optimal clustering via a metric of the user's choosing.
As a summary statistic of the three clinical subtypes, we compute the average distance score within each subtype.The score for the 11 HER2 amp cell lines is obtained as follows.There are 154 = 11×14 experiments in the dataset that involve HER2 amp cell lines, each consisting of 12 measurements.For each pair of experiments, we find the dissimilarity between the 12 measurements, using cosine dissimilarity 1 -dot(v1,v2)/(norm(v1,2)*norm(v2,2)).The average    [1].
Clustering Her2amp straight from the data dissimilarity is obtained by averaging these pairwise distances across the 154 2 = 11781 pairs.Similarly for the HR + cell lines and for the TNBC cell lines.The averages obtained are 0.086 for HER2 amp , 0.334 for HR + and 0.224 for TNBC.The partitioning of the HER2 amp cell lines is given in Fig. S2.

Pre-existing clusters a. Computing pre-existing clusters
We find an initial clustering of the experiments.For this initial clustering, we label each experiment by a single index.The data for the ith experiment is: where AKT 1 i is the normalised time series of fold-change response of pAKT under dose 1ng/ml, and so on.We compute the 504 × 504 similarity matrix S, in which s ij indicates the cosine similarity of experiments i and j: where zi = Z(i, :) and zj = Z(j, :).The entries of zi and zj are nonnegative, which means that , experiments i and j have an identical response to the treatments in both AKT and ERK (up to a scaling constant).When s ij = 0, the data for the experiments are orthogonal.The task of clustering the experiments faces two challenges: the number of clusters is not known a priori, and the matrix S is full matrix and is noisy due to experimental error.To tackle these challenges we use a combination of tools from manifold learning and network science.We create a network (graph) in which each of the 504 experiments is represented by a node, and where connections exist between similar experiments.We first define the dissimilarity matrix D (d ij = 1 − s ij ).We then use the Relaxed Minimum Spanning Tree (RMST) algorithm [2][3][4], which extracts a network representation from high-dimensional point clouds (in this case the zi ) that are embedded in a lower dimensional manifold.Specifically, the algorithm creates an undirected, unweighted network with an edge between i and j if they are neighbors in a minimum spanning tree (MST) from D. The algorithm adds extra edges to the network if they are consistent with the continuity of the data, i.e. if the distances between the points in D is comparable to their separation in the MST and is consistent with the continuity of the data, according to the equation Here mlink ij is the maximal edge weight in the MST path connecting i to j, and k i is the the distance to the nearest neighbour of i (i.e., the minimum value on the ith row of D, excluding d ii ).Basically, what the RMST algorithm does is allows edges to be added to the MST (it 'relaxes' the MST), so that we obtain an network description of high dimensional data that is embedded on a lower dimensional manifold.
Once we have obtained the network from the similarity matrix, we extract communities using the Markov Stability (MS) community detection algorithm [5,6].This method employs continuous time random walks of varying duration (Markov time) to extract communities of the network at different levels of resolution.Shorter Markov times produce small communities, whereas longer Markov times lead to coarse partitions of the network.Obtaining the optimal partition of a network into communities is an NP complete problem, so MS uses heuristics to find communities.Because there is no guarantee of finding a global optimum, MS repeats the heuristic search 100 times for each Markov time.The variability in each set of 100 solutions is measured with the Variation of Information (VI) [7].A low value of the VI for a Markov time indicates that the solutions obtained are similar to each other, we take this similarity as a sign that there is a robust partition of the network for this Markov time.In Fig. S3 we show that the network A has a robust partition into 3, 5 and 7 communities.

b. Structured clusters from pre-existing clusters
We present the clustering assignments after using MS to obtain our initial partition into clusters.The pre-existing and structured cluster results for three clusters (Fig. S4), five clusters (Figs.2D, S4) and seven clusters (Fig. S6).
A finer clustering into seven groups (see Fig. S6) divides the ligands into three groups: high response (yellow, green, brown) {BTC, EGF, EPR, FGF-1, FGF-2, HGF, HRG} and lower response (blues) {IGF1, IGF2, INS} and { NGF-β, PDGF-BB, SCF, VEGF175}.The assignment of cell types is remarkably similar to the five cluster results.The exception cell lines are: HCC1419 (green to brown), and ZR-75-30 (green to the new seventh cluster).Given this consistency, we restrict to mechanistic interpretation of the five cluster case.

Comparison of implementations
When we employ our method assuming no initial clusterings, the integer optimization is learning the values, 0 or 1, for an array of size c × c × l × l (where c = #cell lines and l = # ligands).When we use an initial clustering, CPLEX uses the same branch and cut algorithm on an array of size c × l × k, where c = #cell lines, l = #ligands and k = #clusters, so the array is much smaller for the pre-existing clustering implementation.We show a comparison of the performance of the two implementations in Fig. S7 Three clusters on all 36 cancerous cell lines The MAP Kinase pathway has been widely studied.Models have focused on various features of the cascade, such as its three-tier phosphorylation feedback structure [8,9].The activation profile of these kinases is directly related to cellular decisions and fates [10][11][12][13].The AKT pathway has been modeled by EGF-dependent activation, including phosphorylation of AKT (pAKT) and its downstream intracellular proteins [14].Few models of crosstalk between ERK and AKT exist.One model was created for studying PC12 cells, and they found that AKT acts as a low-pass filter which decouples the EGF signal [15].Another model was created to study HEK293 cells in the presence of a MEK inhibitor; they found crosstalk is reinforced between Ras and PI3K [16].Another model found that JNK is regulated by AKT and MAPK feedbacks in these pathway [17].Chen and co-authors constructed and analyzed an ErbB model focusing on the receptor dynamics and early activation response of the MAPK and AKT pathways in response to ligands in A431 or H1666 cells [13]; however their model was unidentifiable, meaning there were an infinite number of parameter values to fit the data [13].Weakly activated models of MAPK activation cascades with optimal amplification under a variety of stimuli were analyzed in [19].

FIG. S5. Five clusters from Markov Stability
Seven Clusters from Preliminary RMST

All wiring diagrams
We consider all possible wiring diagrams to describe the interactions between the receptor, the Erk pathway and the Akt pathway.These can be written as a wiring-diagram where we assume an arrow exists between the ligand L and receptor R: We first consider all possible network topologies with interactions between the variables R, E and A. There are three possibilities for the directed interaction from one variable to another: positive (→), negative ( ), or no significant interaction (no arrow).There are six potential directed interactions in the network which give a total of 3 6 = 729 networks.
Many of these networks can be ruled out.The data shows some response in pERK and pAKT for each stimuli, therefore we require that both pERK and pAKT have at least one arrow coming into each of these (to ensure a response).Given this restriction from the data, we can eliminate many networks, for example a network where the receptor inhibits pAKT and pERK is biologically infeasible.We also do not have dynamic data of the phosphorylated receptor, therefore we cannot distinguish a network topology that has an arrow feeding back from the phopho-form to the receptor; thus we fix the interaction from pERK to R and pAKT to R to none (no arrow).All of these restrictions produce Table III.
Based on these wiring diagrams, we now consider different possible kinetics for each arrow, summarized in the Table, and described in the next subsection.

Construction of mechanistic models
We construct systems of ordinary differential equation models to describe interaction dynamics between the receptor (R), pAKT (A) and pERK (E).The equation describing the evolution of the phosphorylated receptor is dR/dt = αL(R tot − R) − δR.The total amount of receptor, R tot , is estimated from the receptor abundance data.The unphosphorylated receptor is given by (R tot − R).The parameter α determines the rate at which it is phosphorylated by the ligand dose (L = 1 or 100 The time evolution of pERK and pAKT can be activated by R. by averaging the receptor abundances for that cell line, for each of the receptors associated to the ligand.We do this averaging over all receptor, cell line pairs for which we have data.
We then estimate the receptor abundance for a cluster by averaging the values obtained above for each cell line, ligand pair that is present in the cluster.

Identifiability analysis
Before estimating model parameters from the data, we determine whether a model is identifiable.Models that are globally identifiable have parameters that are uniquely identifiable under ideal data conditions.Models that are locally identifiability have a finite number of indistinguishable parameter values.Since we only have time-course measurements for A and E, we use a differential algebra approach for eliminating the species R. We test identifiability using the algorithm DAISY [20].All of our models are locally, if not globally identifiable given the experimental data.Globally identifiable models are denoted by brown boxes in Fig. S8.

FIG. 1 .
FIG. 1. Schematic of the tensor clustering method and model identification procedure.
i 1 , j 1 ∈ {1, . . ., n 1 } and i 2 , j 2 ∈ {1, . . ., n 2 }.In general, for data indexed by the first d dimensions, we have the multi-indices i = (i 1 , . . ., i d ) and j = (j 1 , . . ., j d ).The dimensions of Z can be re-orderded as needed.We can construct a similarity tensor S of order 2d.The shape of S is determined by the chosen dimensions of the data: S ∈ R n1×...×n d ×n1...×n d .The similarity tensor and the similarity matrix are related by flattening the tensor as follows.The original data tensor Z can be flattened (re-shaped) into a data matrix Z ∈ R N1×N2 , where N 1 = d r=1 n r , and N 2 = h r=d+1 n r .Each row of Z is a N 2 -dimensional vector that corresponds to multiindex i, and the length N 2 is the product of the dimensions of Z that are not included in i.The similarity matrix between the rows of Z is S ∈ R N1×N1 , which is obtained by flattening the similarity tensor, S. We summarize this relationship in the following diagram:To compute the similarity tensor S, we can simply flatten the data tensor Z into Z, construct a similarity matrix S, and then reverse flatten it into the desired S. Note that Z and S have the same number of entries as Z and S respectively.Example.Let Z ∈ R 10×5×3 be a tensor of order 3.If i = (i 1 , i 2 ) is the multi-index, then d = 2, N 1 = 10 • 5 = 50, and N 2 = 3.The (order 4) similarity tensor S has size 10 × 5 × 10 × 5.The similarity matrix S has size 50 × 50.The flattened data matrix Z has size 50 × 3.

2 .
In a n 1 × . . .× n d × m tensor Y where y ik = 1 if the data indexed by i belongs to cluster k, 0 otherwise.

FIG. 2 .
FIG. 2. Tensor-based structured clustering.(A) TNBC clustering with no prior clustering information.(B) HR + clustering with no prior information.(C) Clustering of all cell lines starting from an initial partition into three clusters.(D) Clustering from an initial partition into five clusters.

<x10 - 5 1 FIG. 3 .
FIG. 3. The top four models for each cluster.The strength of interactions are indicated by the size of the arrow.The grey arrows indicate a blocking mechanism for inhibition.Black inhibition arrows indicate a removal mechanism for inhibition.
FIG. S1.Time course measurements of cell line MCF7 exposed to two doses of betacellulin (BTC).

30 FIG
FIG. S2.Clusters from the data of HER2 amp cell line.
FIG.S3.Number of communities and variation of information for the network obtained from RMST similarity graph.
FIG. S4. A. The clustering assignments are represented by yellow, green and blue squares.B. The tensor clustering are a close approximation to the clusters obtained with MS.
FIG. S7.Computational complexity of two implementations with eight ligands and a varying number of cell lines.
FIG. S8.Mechanistic models of breast cancer cell lines.The name of network model is indexed first by whether it is a two arrow (A), three arrow (B), or four arrow (C) model.The second index assigns a number to each network topology.Each network can be further subdivided to describe a model with mass action kinetics (1 in third index) and a model with Michaelis-Menten kinetics (2 in third index).Any network with inhibition ( ) has 4 models considered: mass action removal (1 in third index), Michaelis-Menten removal (2 in third index), mass action blocking (1b in third index), or Michaelis-Menten removal (2b in third index).Stars next to the model name are globally structurally identifiable models, all other models are locally structurally identifiable.

TABLE I .
Breast cancer cell lines used in the data set[1].

TABLE II .
Ligands used in the data set