Classification of self-assembling protein nanoparticle architectures for applications in vaccine design

We introduce here a mathematical procedure for the structural classification of a specific class of self-assembling protein nanoparticles (SAPNs) that are used as a platform for repetitive antigen display systems. These SAPNs have distinctive geometries as a consequence of the fact that their peptide building blocks are formed from two linked coiled coils that are designed to assemble into trimeric and pentameric clusters. This allows a mathematical description of particle architectures in terms of bipartite (3,5)-regular graphs. Exploiting the relation with fullerene graphs, we provide a complete atlas of SAPN morphologies. The classification enables a detailed understanding of the spectrum of possible particle geometries that can arise in the self-assembly process. Moreover, it provides a toolkit for a systematic exploitation of SAPNs in bioengineering in the context of vaccine design, predicting the density of B-cell epitopes on the SAPN surface, which is critical for a strong humoral immune response.


Introduction
A promising route in the fight against major disease, such as malaria [1,2], SARS [3], influenza [4], HIV [5] and toxoplasmosis [6], is a novel family of nanoparticle-based vaccines [7,8]. They rely on a special class of self-assembling protein nanoparticles (called SAPNs) that form from multiple copies of a purpose-designed protein chain, functionalized to present epitope antigens on the particle surface. Other approaches to design protein-based nanoparticulate systems have been published by various research groups [9,10]. The architecture of such designs have been described with high accuracy [11,12]. A major challenge in the rational design of such SAPNs lies in the control of their surface structures, as building blocks can self-assemble into a spectrum of different particle morphologies. Starting with the work of Raman et al. [13], several SAPN species have been synthesized, but their structures have not been completely determined in most cases, and nanoparticle populations are usually characterized in terms of the diameter of the particles only. In some studies, the numbers of the protein chains composing the particle have been identified. For example, Kaba et al. [1] and Raman et al. [13] report particles corresponding to assemblies of 60 chains; Pimentel et al. [3] describe SAPNs with 120 chains; Yang et al. [14] discuss species made of 180 and 300 chains; and finally, Indelicato et al. [15] report assemblies of 240, 300, 360 chains. Also smaller assemblies, socalled LCM units containing 15 protein chains have been discussed and reported [13,16]. However, an exhaustive enumeration of all possible nanoparticle morphologies that can arise from multiple copies of a given type of building block is currently lacking. This presents a bottleneck in the prediction of the display of B-cell epitopes on the surface of the SAPNs to render them optimal repetitive antigen display systems.
The challenge of enumerating all possible SAPN geometries is reminiscent of the one faced in the classification of virus structures. Similar to SAPNs, viruses assemble the protein containers that encapsulate their genomes (viral capsids) from multiple copies of a small number of different capsid proteins, in many cases a single type of capsid protein. These proteins typically group together in clusters of two, three, five or six in the capsid surface, akin to the clusters seen in SAPN architectures. Caspar & Klug's seminal classification scheme of viral architectures [17] relies on a geometric approach, predicting the spectrum of possible virus architectures in terms of the numbers and relative positions of these protein clusters (capsomeres) with reference to spherical surface lattices. This classification has revolutionized our understanding of virus structure, and plays a key role in the interpretation of experimental data in virology. This classification of virus architectures has been developed for particles with icosahedral symmetry and, as such, can be used also for synthetic vaccines based on virus-like particles, but is not suitable to model SAPNs.
We develop here a classification scheme for SAPN morphologies in terms of surface tessellations and associated graphs that pinpoint the positions of the protein building blocks in the particle surfaces. Our approach exploits the geometric relation of SAPN morphologies with fullerene architecture, and further develops tools that have been introduced for fullerene classification. As a result, we present a procedure to classify SAPN morphologies both symmetric and asymmetric, and we deliver a classification for high and low symmetry particles seen in the experiments. In particular, we explicitly determine particle morphologies for symmetric particles formed from up to 360 protein building blocks, as there is experimental evidence that spherical particles up to this size should exist, and these are relevant for vaccine design [1,3,14,15]. Defective nanoparticles are not considered in this work as they require a different mathematical model, and will be the object of future investigation.

Self-assembling protein nanoparticle morphologies and their mathematical representation as spherical graphs
SAPNs are formed from multiple copies of a single protein building block (PBB) that is designed to self-assemble into particles via formation of specific cluster types. We focus here on SAPNs used in vaccine design, with PBBs given by pairs of linked helices (figure 1a). These are designed to interact via formation of trimeric and pentameric coiled coils involving, respectively, three (blue) and five (green) helices of different PBBs. SAPN architectures are thus characterized by the numbers and positions of these threefold and fivefold clusters. As the trimeric and pentameric coiled coils are connected in the PBBs, SAPNs can be represented as spherical graphs in which vertices mark trimer (black spheres in figure 1) and pentamer (white spheres) positions, and edges represent the PBBs connecting them. We refer to these graphs as nanoparticle graphs. In vaccine design, the PBB helices are functionalized, e.g. via an extension of the trimer-forming helices by viral epitopes as in the case of the SARS HRC1 [3]. Information on the positions of the trimeric coiled coils therefore provides insights into epitope location in the nanoparticle surface. For example, figure 1c illustrates how nanoparticle graphs translate into SAPN morphologies, based on the example of a particle  The SAPN building blocks consist of two fused polypeptide helices, that cluster in groups of three (black sphere) and five (white sphere) in the nanoparticle shell. (b) Nanoparticle graphs correspond to spherical tessellations in terms of rhombs and hexagons, with vertices labelled alternatingly by black and white spheres. (c) A SAPN formed from 180 PBBs, together with its nanoparticle graph (adapted from a figure by N. Wahome and P. Burkhard). The nanoparticle model was built using a variety of adapted tools from the CCP4 program suite (www.ccp4.ac.uk/), the modelling software O (xray.bmc.uu.se/) and data from the RCSB database (www.rcsb.org/). The nanoparticle graph has been obtained by modifying a fullerene graph of the library of the FULLERENE PROGRAM [18].
formed from 180 PBBs. It has 36 pentameric and 60 trimeric clusters, with epitope positions marked by black spheres. A classification of nanoparticle graphs thus provides an atlas of SAPN geometries and epitope positions.

Nanoparticle graphs as tilings
By construction, nanoparticle graphs have two types of vertices, V 3 and V 5 , in which, respectively, precisely three or five edges meet. From a mathematical point of view, they are bipartite, (3, 5)-regular spherical graphs. Such graphs can be viewed as spherical surface tessellations (tilings) in terms of shapes that have an even number of edges connecting, alternatingly, vertices from V 3 and V 5 . For the sake of simplicity, we focus our analysis on tessellations in terms of hexagons and rhombs (i.e. the shapes with the smallest number of edges) with edges alternatingly marked via black and white spheres along their boundaries (figure 1b).
As each PBB corresponds to an edge in the nanoparticle graph, connecting a trimeric coiled coil (a vertex from V 3 ) with a pentameric coiled coil (a vertex from V 5 ), the number N of its edges must satisfy N = 3|V 3 | = 5|V 5 |. This results in the restriction with m ∈ N, implying that the number of PBBs in any particle must be a multiple of 15.
For a nanoparticle graph with N = 15m chains, Euler's formula f = 2 − v + e relates the numbers of vertices v = |V 3 | + |V 5 | = 8m, edges e and faces f of the corresponding spherical tiling. Using the fact that edges fulfil the condition 4r + 6x = 2e = 2N = 30m, with r and x denoting the number of rhombs and hexagons, respectively, one obtains As the number of hexagons must be zero or larger, this implies m ≥ 4, and the nanoparticle with N = 60 is thus the smallest possible option. Its nanoparticle graph corresponds to a rhombic triacontahedron, i.e. an icosahedrally symmetric polyhedron with 30 rhombic faces, 60 edges, 12 fivefold vertices, and 20 threefold vertices.

Nanoparticles and fullerenes
An exhaustive enumeration of nanoparticle graphs is a combinatorial challenge. We introduce here a method that relates SAPN geometries with those of fullerene cages, i.e. three-coordinated cages with vertices formed from carbon atoms. From a mathematical point of view, fullerenes correspond to threeregular spherical graphs with 12 pentagonal and otherwise hexagonal faces, and their geometries have been classified previously [18][19][20]. Using the method presented below, this classification of fullerene graphs can be used to derive a classification of SAPNs in terms of nanoparticle graphs. From nanoparticles to fullerenes. To any nanoparticle graph N with isolated hexagons, i.e. in which hexagonal tiles do not share a vertex, a unique fullerene graph F can be associated via the following vertex addition rule (figure 2). In step one, a trimer is added at the centre of every hexagonal face and is connected to the white vertices (pentamers) on its boundary, resulting in a tessellation in terms of rhombs (graph N ). In step two, every pair of black vertices (trimers) on the boundary of the same rhomb is connected along a diagonal of the rhomb. In step three, vertices from V 5 (white) and all edges of N are removed. The remaining vertices V 3 , given by the union of V 3 (black vertices) and the (red) vertices added in step one, and their connections via the edges added in step two, define the fullerene graph F . The vertex addition rule relates the number of vertices, edges and faces of a nanoparticle graph with that of its fullerene graph counterpart according to table 1.
From fullerenes to nanoparticles. The above procedure is not always reversible. Reversal would require completion of the following three steps. In step one, the set V 5 of the nanoparticle graph is constructed by placing a vertex at the centre of each face of the fullerene graph F , i.e. by adding the vertices of the dual graph of F to the vertices V 3 of the fullerene graph. In step two, each such vertex is connected to those vertices from V 3 that are located on the same face, and all edges of the fullerene graph are removed. This yields a bipartite graph N with vertices of degree 3 (V 3 ) and vertices of degree either 5 or 6 (V 5 ). Finally, in order to obtain a nanoparticle graph N , removal of vertices from V 3 is required so that all vertices in V 5 have degree 5. This requires eliminating (colouring) of exactly one vertex of the fullerene graph F for each hexagonal face, and none from the pentagonal faces, which we will refer to as the vertex colouring rule in the following. Such colouring may not be possible or not be unique. A necessary condition for a fullerene graph to result in a nanoparticle graph with N = 15m edges via the vertex colouring rule is that the fullerene graphs must have 6m − 4 vertices, corresponding to the sum   of the number of vertices V 3 and hexagonal faces x of the nanoparticle graph. We will therefore in the following classify SAPN morphologies, either symmetric or not, starting with fullerene graphs C 6m−4 , that have been classified previously [18][19][20].

Results
Fullerene cages can have varying degrees of symmetry, including the icosahedral symmetry of the Buckminster fullerene and carbon onions, the lower dihedral symmetries of prolate architectures, and the asymmetric shapes of fullerene cones. Similarly, nanoparticle graphs and SAPNs can vary in symmetry. We start with a classification of nanoparticle graphs with non-planar symmetries, i.e. those with at least two different types of symmetry axes. Note that fourfold symmetry axes cannot occur. This is because vertices of nanoparticle graphs cannot occupy fourfold axes, and octagonal faces are excluded. Therefore, icosahedral and tetrahedral symmetries are the only possible non-planar options. Symmetry imposes strong restrictions on the number N of edges of the nanoparticle graph, so that only particles with certain numbers of PBBs are allowed. In order to construct the nanoparticle graphs for these cases explicitly, we adapt methods used previously in the context of fullerene architecture. In particular, for the modelling of the icosahedrally symmetric nanoparticle graphs we adapt the Goldberg-Coxeter procedure [21,22], and for the tetrahedral graphs we use its extension to tetrahedral symmetry by Fowler et al. [23]. In each case, we first construct the fullerenes with required symmetry and number of edges, and then derive the corresponding nanoparticle graphs via the vertex colouring rule in figure 2.

Icosahedral nanoparticles
We first derive restrictions owing to symmetry. Consider the icosahedral group I acting on the nanoparticle graph (embedded into a sphere). Denote by t d and p d the number of trimers and pentamers in generic positions in the fundamental domain, i.e. those not positioned on symmetry axes of the particle. Then, for the particle to have icosahedral symmetry, the following relationship has to be fulfilled: Here, α = 1 or 0 indicates the presence or absence of trimers on the threefold axes of icosahedral symmetry, and γ = 1 or 0 of pentamers on the fivefold axes, respectively. Note that, as I is a subgroup of the full icosahedral group I h , this restriction also holds for nanoparticles with full icosahedral symmetry. There are only two solutions up to N = 360, given by N = 60 and N = 360. We use the Goldberg-Coxeter construction for fullerenes to determine the corresponding nanoparticle graphs. In this construction, a fullerene graph is represented as a superposition of an icosahedral surface (20 equilateral triangular faces) on a planar hexagonal grid such that the icosahedral vertices coincide with centres of the hexagonal tiles ( one vertex of the fullerene graph, i.e. one carbon atom. Denoting the area of this triangle as , then an equilateral triangle with vertices at (0, 0) and P = (i, j) has area (i 2 + ij + j 2 ) , and therefore contains i 2 + ij + j 2 vertices of the fullerene. Given that the planar net of the icosahedron contains 20 equilateral triangular faces, fullerenes with icosahedral symmetry are only possible if they have 20(i 2 + ij + j 2 ) vertices. As only fullenene graphs with 6m − 4 vertices can correspond to nanoparticle graphs with N = 15m chains (recall table 1), we obtain the condition . The two possible solutions N = 60 and N = 360 correspond to isomers with (i, j) = (1, 0) or (i, j) = (0, 1), and (i, j) = (2, 1) or (i, j) = (1, 2), respectively. In each case, we construct the planar net and apply the vertex colouring rule. In the first case, the nanoparticle graph has no hexagons and corresponds to the rhombic triacontahedron. In the second case, colouring compatible with icosahedral symmetry is indeed possible and results in two structures that are identical up to helicity (cf. table 2).

Tetrahedral nanoparticles
As before, we first derive symmetry restrictions on N. Denoting by t d and p d the number of trimers and pentamers in generic position in the fundamental domain, the symmetry condition is where α, β in {0, 1} indicate the absence or presence of trimeric clusters on the two types of threefold sites. Note that these correspond, respectively, to corners and centres of faces of a tetrahedron. The solutions specify the allowed chain numbers for particles with tetrahedral symmetry. Up to and including 360 chains, these are N = 60 for (α, β, t d , p d ) = (1, 1, 1, 1); N = 120 for (0, 1, 3, 2) and (1, 0, 3, 2); N = 180 for (0, 0, 5, 3); N = 240 for (1,1,6,4); N = 300 for (0, 1, 8, 5) and for (1,0,8,5); and N = 360 for (0, 0, 10, 6). By table 1, these correspond to fullerenes C n with n = 20, 44, 68, 92, 116, 140. Note also that, because T is a subgroup of the tetrahedral groups T h and T d , the above restrictions hold also for nanoparticles with higher tetrahedral symmetry. Fullerenes with tetrahedral symmetry can be constructed via the Fowler-Cremona-Steer construction [23], which is based on the superposition of the surface of a polyhedron with tetrahedral symmetry onto a planar hexagonal tessellation as shown in figure 4. The polyhedral surface corresponds to the union of three types of triangles, equilateral and scalene, which are characterized via a quadruplet of integers (i, j, h, k) as follows: the four large equilateral triangles are given as in the Goldberg-Coxeter construction via (i, j), and the four small equilateral triangles by (h, k) (points P and Q in figure 4) Figure 5. Atlas of tetrahedral and icosahedral nanoparticles; the depicted domains are the union of three fundamental domains of the tetrahedral group (cf. figure 4).
We construct the planar nets for all tetrahedral solutions above, using the (i, j, h, k) vectors from the Fowler-Cremona-Steer classification (table 2), and check whether the vertex colouring procedure can be applied to obtain a nanoparticle graph. Note that the colouring is not always possible, and that there are cases in which there are different nanoparticles corresponding to the same fullerene. We list all resulting nanoparticle graphs with at least tetrahedral symmetry in table 2 and provide the corresponding atlas in figure 5. We give an explicit example of the full net of a tetrahedral fullerene graph and its associated nanoparticle graph (electronic supplementary material, figures S1 and S2).

Particles with lower symmetry
The procedure introduced above allows one to construct nanoparticle graphs with arbitrary, or lack of, symmetry. In particular, as nanoparticle graphs with rhombic and hexagonal faces cannot have sixfold axes, neither sixfold rotational symmetry axes nor D 6 symmetry are possible. By contrast, particles with D 5 and D 3 symmetry can occur. Particles with D 5 symmetry must fulfil the necessary condition where α = 1 when the two sites of fivefold symmetry are each occupied by pentamers, and p d , t d have the same meaning as before. Note that the exclusion of decagonal tiles implies α = 1. There are only three possible solutions for chain numbers up to and including 360: N = 60 (and C 20 ) for (p d , t d , m) = (1, 2, 4); N = 210 (and C 80 ) for (p d , t d , m) = (4,7,14); and N = 360 (and C 140 ) for (p d , t d , m) = (7, 12, 24). As before, models of fullerenes with D 5 symmetry can be constructed by superimposing the general planar net of a polyhedron with such symmetry onto a hexagonal tessellation of the plane (cf. [23]). This again requires the specification of four integers (i, j, h, k), and corresponding values are listed in table 3. Note that the nanoparticle corresponding to C 20 yields the classical icosahedral solution, while the isomer of C 80 with coordinates (1, 0, 2, 1) results in two different particles with D 5 symmetry. Just two isomers of C 140 yield solutions upon colouring (three of which have D 5 symmetry and one I). All colourings generating nanoparticles with at least D 5 symmetry are listed in table 3.
Regarding D 3 symmetry, the necessary condition is where α ∈ {0, 1} indicates the absence or presence of trimeric clusters on the particle threefold axes. Inspection of the Fowler-Cremona-Steer construction shows that the two sites of threefold symmetry must both be occupied by trimers, so that α =  table 4. Nanoparticles with lower symmetry can also occur in all of these cases if the vertex colouring rule is applied in such a way to the associated fullerene graphs C 6m−4 that its symmetry is reduced or broken. An example of this is provided in the electronic supplementary material, figure S3, showing all ways in which the symmetry of the icosahedral particle with 360 chains can be reduced. This demonstrates how  the procedure developed here can be used to determine all lower symmetry alternatives for any of the higher symmetry particles listed in the previous subsection.

Exploitation in the context of vaccine design
These results pave the way for the optimization of SAPN morphologies for applications in vaccine design. To generate an optimal humoral immune response, repetitive antigen display is a key determinant [24][25][26][27][28][29][30][31][32]. SAPNs represent an ideal model for repetitive antigen display. They are similar to virus-like particles, but they have the advantage that they are more flexible in protein design, allowing testing of different architectures relatively easily. B-cell epitopes can be attached to either end of the protein chain and will thus be displayed close to the trimer and pentamer vertices of the particular SAPN architecture. The geometries as outlined above allow straightforward calculation of the distances between epitopes. This defines the epitope density, which in turn is related to the strength of the immune response. Already several decades ago, in their hallmark publication Dintzis et al. [33] related the epitope density to the socalled immunon, a determinant of the strength of the immune response. Based on our results, we can estimate the average distance between trimers and pentamers by a simple density argument. Given a nanoparticle graph N , with N = 15m edges, consider the associated graph N , in which each hexagon is replaced by three rhombs (figure 2), so that N has only rhombic faces, specifically 9m − 6 (by table 1). Assuming that all rhombs are approximately equal, with approximately the same area, shape and sides, the area A of a spherical rhomb on the surface of a sphere of radius r can be estimated as Given the area of the rhomb, and using spherical geometry we obtain table 5 for the average distances between trimers and between pentamers on a sphere of radius r. The epitopes can be on either end of the SAPN, i.e. on the trimer or on the pentamer. Identical epitopes will however always be on the same oligomerization domain. Computer modelling and experimental analysis have shown that the radius of the central cavity of the SAPNs, i.e. where the two coiled coils are joined together for a SAPN with N = 60 is about 3 nm. The dimension of the central cavity will increase with the number of protein chains per particle. Also, the B-cell epitope will not be located on top of the vertices but rather roughly on top of the individual α-helical axes. The distance of this axis of the coiled coil α-helices relative to the trimer and pentamer axes is about 0.65 nm and 0.85 nm for the trimer and pentamer, respectively [34,35]. These two values have to be subtracted from the calculated distance between either two trimer or two pentamer vertices in table 5.
If the B-cell epitope itself is a coiled-coil trimer as for example in the SARS [3] vaccines then we can calculate the distance between adjacent B-cell epitopes for a given length of a coiled coil. For instance, in the SARS nanoparticle with N = 120 and a helix length of about 7 nm, the distance between epitopes located at trimeric sites would be about 4.6 nm. If the B-cell epitope itself is not coiled-coil, which has a quite extended shape, then the particular dimension of the B-cell epitope will also have to be taken into consideration. If it is a folded protein domain then it has quite likely a roughly spherical shape. The size of a protein like lysozyme is about 3.5 nm. Using a particular SAPN architecture the B-cell epitope can then be placed in an array with a rather precise spacing depending on the lengths of the coiled coil of the SAPN. This gives the vaccinologist a tool to optimize the vaccine for best immune response.

Discussion
The classification presented here provides, to our knowledge, the first complete atlas of SAPN geometries of D 3 symmetry or higher, and provides a construction method for all particles, including low symmetry and asymmetric ones. We have demonstrated previously that a combinatorial analysis of SAPN structures can be an invaluable tool in the interpretation of experimental data. In particular, biophysical methods such as analytical ultracentrifugation can provide information on the numbers of chains N in the particles that occur in the self-assembly process. Combinatorics does then narrow down the spectrum of options to a limited ensemble of particle geometries compatible with this range of chain numbers, and identifies the precise surface structures of the particles in terms of the placements of all protein chains and threefold and fivefold coiled coils. It also offers a glimpse at the complexity of the assembly process in terms of the numbers of different particles that can occur in a given range of chain numbers. In previous work [15], a full classification had not yet been available. It was therefore only possible to identify possible candidates for the particles seen in experiment, but an exhaustive enumeration was not possible.
The construction method with reference to fullerene architecture introduced here provides a step change. It offers for the first time, to our knowledge, insights into the full spectrum of particles of arbitrary size and morphology occurring in an experiment. This exhaustive approach therefore opens up opportunities for the analysis of experimental data that had not been possible before. For example, it is now possible to apply statistical mechanics approaches and construct partition functions describing the outcome of the assembly experiments. These can be used to better understand the assembly process itself in terms of the most likely, dominant assembly pathways. This, in turn, will provide pointers for experimentalists on how to optimize the assembly procedure, e.g. in terms of the yield of desired particle types. The detailed insights into the connectivity of each chain in the nanoparticle surface moreover enable computer reconstructions of the nanoparticles, as in the example in figure 1c. These can then be used to engineer specific architectures by controlling the rigidity of the links and the angle between the coiled coils (an issue not addressed here).
Most importantly, however, the results obtained here enable the identification of SAPN morphologies that have not yet been synthesized, and thus enable the rational design of desired particle morphologies. In particular, our approach links SAPN morphologies with epitope positions, and therefore provides a tool for the identification of SAPN morphologies with optimal properties for vaccine design. However, if the SAPNs are co-assembled from different chains, i.e. if the SAPNs are composed of epitope-decorated units and protein chains lacking epitopes, then the assembly forms will be much more difficult to predict. Depending on the B-cell epitope, chains with epitope may cluster together if there are attracting forces between the B-cell epitopes. Also, we do not exclude the possibility that SAPNs may be formed that have an irregular assembly form of protein chains owing to imperfect propagation of the lattice in all directions. If so, this would lead to chimeric forms of SAPNs with respect to their architecture as described here. Data accessibility. The paper is a theoretical study and is self-contained. Authors' contributions. G.I., P.B. and R.T. designed research; G.I. and R.T performed research; and G.I., P.B. and R.T. drafted the manuscript. All authors gave final approval for publication.
Competing interests. P.B. is CEO of the company Alpha-O Peptides and has patents or patents pending on the technology.
Alpha-O Peptides did not fund any of this research. The paper is a mathematical paper that refers to self-assembling protein nanoparticle (SAPN), a technology that has been patented by P.B.
Funding. Funding for G.I.'s visit to York via Italian National Group of Mathematical Physics (GNFM-INdAM) and by EPSRC grant EP/K028286/1 is gratefully acknowledged. R.T. also thanks the Royal Society for a Royal Society Leverhulme Trust Senior Research Fellowship (LT130088).