State-space reduction and equivalence class sampling for a molecular self-assembly model

Direct simulation of a model with a large state space will generate enormous volumes of data, much of which is not relevant to the questions under study. In this paper, we consider a molecular self-assembly model as a typical example of a large state-space model, and present a method for selectively retrieving ‘target information’ from this model. This method partitions the state space into equivalence classes, as identified by an appropriate equivalence relation. The set of equivalence classes H, which serves as a reduced state space, contains none of the superfluous information of the original model. After construction and characterization of a Markov chain with state space H, the target information is efficiently retrieved via Markov chain Monte Carlo sampling. This approach represents a new breed of simulation techniques which are highly optimized for studying molecular self-assembly and, moreover, serves as a valuable guideline for analysis of other large state-space models.


Introduction
The oft-cited quote 'We are drowning in information but starved for knowledge' implies that information has become so abundant that it is nearly impossible to draw useful conclusions from it.   The availability of affordable, high-performance computational resources allows for the simulation of sophisticated models with enormous state spaces. A researcher can, therefore, generate huge volumes of data with relatively little effort. On the other hand, the 'target information' desired by a researcher may be so deeply embedded in this dataset that it cannot be isolated without the use of elaborate statistical analysis. We will refer to these situations as large state-space problems. Techniques that efficiently recover target information from such large datasets are subject to intense research, and will remain of high priority so long as our hunger for facts and figures remains unsatisfied. As a case in point, we consider the phenomenon of molecular self-assembly, in which molecules deposited on a surface spontaneously assemble into highly ordered structures [1][2][3]. Figure 1a illustrates the fabrication of graphene by self-assembly of molecules on a copper surface [4]. Upon heating, the molecules assemble into chain-shaped 'islands' (figure 1b,c), and on further heating a chemical reaction occurs between molecules, resulting in graphene (figure 1d). Island formation is a necessary 'stepping stone' for graphene formation, and one goal of this experiment is to identify conditions (temperature, type of molecule and type of surface) that support island formation. However, this experiment also yields considerable additional information, including the positions of the islands on the surface, the distances between the islands and so on. This information does not help us to identify island-forming conditions, although it may be interesting in other contexts. We are faced with the following problem: build a model for the molecular self-assembly process and then estimate the probability of forming particular islands (the 'target information') without estimating any other properties of the model. Such a model should possess a very large state space in order to incorporate every conceivable outcome of the experiment in figure 1. Molecular self-assembly, therefore, provides an example of a large state-space problem and is of sufficient scientific importance to warrant attention.
Numerous models and methods have been employed in the chemistry and physics literature to study the molecular self-assembly process. Some recent reports have described molecular dynamics simulations [5][6][7][8][9][10], density functional theory [11], kinetic Monte Carlo simulations [12], stochastic models [13], graph theory [14] and mean-field models [15], to name a few. However, the possibility of studying self-assembly by selective retrieval of target information as described above has not been considered in this literature. In fact, the need for such a study seems to be particularly acute for the case of molecular self-assembly. For example, while molecular dynamics simulations should eventually yield reliable statistics on island formation, the simulation times required for the entire state space to be sufficiently sampled are inaccessible with modern computational power. The enormous state space of typical molecular self-assembly models, therefore, provides a barrier to both data collection and data analysis. Specialized mathematical techniques that can astutely access the target information embedded in a molecular self-assembly model would, therefore, provide a new generation tools for studying

Chiral block assembly model
The chiral block assembly (CBA) model is a simple model for the molecular self-assembly process described in figure 1. Consider a finite square lattice L d of dimension d × d, where d is an integer. N blocks reside on the lattice (figure 2a), where N is a fixed and given integer. A block is a square subset of the lattice that contains 3 × 3 points. The centre-most point of the block is called the coordinate of the block. The remaining eight points belong to the perimeter of the block. The perimeters of each block are decomposed as follows. Let t be an arbitrary block and let F denote the perimeter of t. Let F 1 (respectively, F 2 , F 3 , F 4 ) be the three points in t that lie above (respectively, to the left of, below, to the right of) the coordinate of t. The sets F 1 , F 2 , F 3 and F 4 are called faces, and F is the union of its faces. The block t is said to have one of two types of orientations, o 1 and o 2 , and two face types, m 1 and m 2 . If t has orientation o 1 , then faces F 1 and F 3 are of type m 1 , and faces F 2 and F 4 are of type m 2 . If t has orientation o 2 , then faces F 1 and F 3 are of type m 2 and faces F 2 and F 4 are of type m 1 . Two blocks are said to touch at their faces if the intersection of their faces contains more than one point. In this case, the two blocks are said to be neighbours. Note that this definition excludes cases where two blocks only share a 'corner'. Two blocks are said to overlap if their coordinates are contained in their intersections. The blocks may be arranged on the lattice in any way and can touch at their faces, but no two blocks may overlap.
To define an interaction energy between neighbouring blocks (pairs of blocks that touch at their faces), we first classify the ways in which neighbours can be aligned with respect to each other. These alignments are referred as P, S and T and the rules for assigning the alignments are described in figure 2a. Note that the rules or assigning alignments are independent of the orientation of the block. For each pair of neighbouring blocks i and j, we assign an interaction energy ε r,m i ,m j , where r is the alignment of blocks i and j (S, P or T), and m i (m j ) is the face of block i (block j) which is touching the face of block j (block i). The condition ε r,m i ,m j = ε r,m j ,m i is enforced. The collection of these interaction energies is called the force field. The adjective 'chiral' refers to the fact that the alignment between any two neighbours does not possess mirror symmetry. This feature occurs in many real cases of molecular self-assembly on metal surfaces [4,24]. The CBA model is a special case of Winfree's tile assembly model [25][26][27].  Without further mention, we will assume periodic boundary conditions on the CBA model. This is achieved by replacing the lattice L d with the lattice L d , and then, imposing that the lattice points 1 and d + 1 in X d + 1 are identical and that lattice points 1 and d + 1 in Y d + 1 are identical. This results in the torus T d . The nomenclature lattice and dimension will be taken to mean the set X d × Y d and the integer d, respectively.
A configuration is any choice of block positions and orientations that satisfies the conditions for the CBA model (no overlap of blocks). Let Ω be the collection of all configurations. For each σ ∈ Ω, we assign the probability where k B and T 0 are positive constants (the Boltzmann constant and temperature, respectively) and (σ ) is the sum of the interaction energies for each pair of neighbouring blocks in σ . The normalizing constant in (2.1) is assumed unknown. A state is any choice of coordinates for the blocks that satisfy the conditions of the CBA model. We can write where T is the state space and O is all choices of orientations for the blocks in T. For each b ∈ T, we can assign the probability where ε(b, o) is the sum of interaction energies for each pair of neighbouring blocks in the configuration (b, o) ∈ Ω. Accordingly, we can write is the probability of the blocks having orientations o ∈ O given that the blocks are arranged according to b.

Equivalence class sampling
We wish to estimate the stationary state properties of the CBA model by sampling states from T and orientations from O according to the target distributions μ(·) and λ(o; ·), respectively. We will focus exclusively on the problem of sampling from T, because the problem of sampling orientations (which is equivalent to sampling the Ising model) has been studied thoroughly in other contexts [28].

State-space reduction and Markov chain Monte Carlo
The ECS technique involves two stages. In the first stage, the state space T is reduced in such a way that some 'target information' is preserved and the remaining superfluous information is eliminated. For the self-assembly problem, the target information pertains to the kinds of islands that are likely to appear according to ( model. Let b be an arbitrary state of the CBA model and denote the blocks in b by t 1 , t 2 , . . . , t N . An island in b is a subset of blocks I from b such that I.1. I cannot be split into two disjoint subsets I 1 and I 2 in which there is no pair t a ∈ I 1 and t a ∈ I 2 which are neighbours, and I.2. There does not exist a block t c ∈ I such that t c ∈ b and I = I ∪ t c satisfies I.1.
I.1. essentially says that an island cannot be split into two disjoint parts, and I.2. asserts than a subset of an island is not an island. Consider the two states b 1 and b 2 shown in figure 2b. Imprecisely, we can say that both b 1 and b 2 contain 'same kinds' of islands because if we rotate the islands in b 1 and translate them about the lattice, we can achieve b 2 . More precisely, two states b 1 and b 2 are said to be rotational isomorphs if RI.1. Each island in b 1 can be mapped onto an island in b 2 by rotations and translations alone, and RI.2. this mapping is one-to-one and onto between the islands of b 1 and the islands of b 2 .
Similarly, we say that two islands I 1 and I 2 are rotational isomorphs if I 1 can be mapped onto I 2 with rotations and translations alone. We can, therefore, partition T into equivalence classes E 1 , E 2 , . . . ,E h , where any two states b 1 and b 2 belong to the same equivalence class if and only if b 1 and b 2 are rotational isomorphs. We refer to the collection as the reduced state space. We also have the following.
Proof of theorem 3.1 is provided in appendix A.3. According to theorem 3.1, we can assign the probability to equivalence class E k ∈ H, where n(E k ) is the number of states that belong to equivalence class E k (the degeneracy of E k ), and μ(E k ) is the probability from equation (2.3) calculated for any member of E k . The problem of 'what kinds of islands appear at equilibrium' can be stated as 'which equivalence cases occur with relatively high probability v'. In order to obtain this target information, we only need to consider the reduced state space H, rather than the (considerably larger) space T. Note that, in contrast with the state space T, the reduced state space H does not grow as the dimension d of the lattice increases (providing that the number blocks N is fixed). Thus, for very large d, the reduced state space H will be considerably smaller than the full state space T. In passing, we note an important physical aspect of the ECS approach. It can be shown that where A(E k ) can be regarded as the 'free energy of equivalence class E k '. This is discussed in detail in appendix A.1. ECS, therefore, directly incorporates thermodynamics into the MCMC sampling framework, which is a highly desirable feature when the equilibrium (stationary) state properties of a molecular self-assembly model are of interest. In order to implement ECS in a computational environment, it is necessary to use a canonical representation for each equivalence class in H (figure 2c). In this study, the canonical representation C k of equivalence class E k is generated by choosing an arbitrary state b from E k , and letting the elements of C k be the islands from b as well as one empty element. The empty element is necessary for our implementation of MCMC as described in the following section. We do not distinguish between canonical representations that can be made equivalent by rotational isomorphism of their elements or re-ordering of their elements. Therefore, a canonical representation C k is a unique representation of the equivalence class E k . The number of islands in equivalence class E k is the number |C k | − 1.

Extension-reduction chain
To sample equivalence classes via the MCMC algorithm, we will employ the MH algorithm. We use Z to denote the Markov chain generated by the MH algorithm (the 'MH chain'), and Y to denote the Markov chain used to generate proposals for the MH algorithm (the 'proposal chain'). In this section, C k = C j = extensionreduction transformation Figure 3. Example of an extension-reduction transformation on equivalence class E k . The island in the red box is extended and the island in the dotted red box is reduced.
we introduce a suitable proposal chain called extension-reduction chain. To define the extension-reduction chain, we first construct a transformation K that maps equivalence classes to other equivalence classes at random. Fix an equivalence class E s with canonical representation C s and consider the following actions.
T.1. Choose island k from C s with probability where i k is the number of blocks in island k and α 1 is a positive, non-zero constant. Now, choose island j from C s with probability where α 2 is a positive, non-zero constant. T.2. For the island k chosen in step T.1, let I − k be the set of all islands that can be obtained by deleting a single block from island k. Each of the islands in I − k must satisfy the conditions I.1 and I.2 outlined earlier. Now, if there are pairs of islands in I − k that are rotationally isomorphic, delete one member of the pair and repeat until there are no pairs of islands in I − k which are rotationally isomorphic. Denote the resulting set by T − k . T − k is called the reduction set of island k. T.3. For the island j chosen in step T.1, let I + j be the set of all islands that can be obtained by adding a single block to island j. Then, delete islands from I + j until there are no two pairs of islands in I + j which are rotationally isomorphic. Denote the resulting set by T + j . T + j is called the extension set of island j. T.4. Replace island k in C s with a uniformly chosen element of T − k , and replace island j in C s with a uniformly chosen element of T + j . This produces a set C s . Add one empty island to C s if there are no empty islands in C s . If there are two empty islands in C s , delete one empty island. Set C r = C s .
The resulting set C r is a canonical representation for an equivalence class E r which may or may not be identical to E s . This transformation is denoted with the symbol K, and is referred to as the extensionreduction transformation. An example of an outcome of the extension-reduction transformation is shown in figure 3. Note that by virtue of equation (3.5), the empty island may be replaced with a single block with finite probability for any occurrence of the extension-reduction transformation. In such a case, we can intuitively think of the transformation as creating a new island by removing a block from an existing island.
The extension-reduction transformation K maps equivalence classes to equivalence classes at random. The probability is evaluated in appendix A.5. With this characterization of the extension-reduction transformation, we can generate a Markov chain with state space H as follows.
, and so on. Then the sequence Y 1 , Y 2 , Y 3 , . . . is a υ-irreducible and aperiodic Markov chain on H with transition probability P( Theorem 3.2 is proved in appendix A.3. The Markov chain Y in theorem 3.2 is the extension-reduction chain. The parameters α 1 and α 2 in equations (3.4) and (3.5) can be used to control the evolution of the Y. Y does not have stationary distribution υ, and to perform ECS we therefore use the transition probability in equation (3.6) to simulate the standard MH chain Z.

Approximation for the degeneracy factor
It does not appear possible to provide closed-form expressions for the degeneracy factor n(E k ) in equation (3.2) in the general case. However, we can form a good approximation to it with the following twostep method. (a) Compute the degeneracy factor exactly for a special case in which each island only covers a single lattice point. (b) Show that the degeneracy factor for the actual CBA model converges to the degeneracy factor obtained from this special case in a 'low-coverage limit', in which the lattice is extremely large compared with the number of blocks.
The concept of a semicanonical representation is needed to carry out step (a) described above. Choose an equivalence class E k and let b 1 and b 2 be any two states that belong to E k . Let U k and V k be the set of islands belonging to b 1 and b 2 , respectively. Suppose that U k and V k can be made equivalent by translation of their islands without rotation. U k and V k are called semicanonical representations of equivalence class E k . We do not distinguish between semicanonical representations that are equivalent under translation of islands, and therefore U k and V k are considered identical semicanonical representations. A canonical representation is also a semicanonical representation; however, two sets of islands corresponding to the same canonical representation may correspond to different semicanonical representations. Now, fix a semicanonical representation V k . We can partition the set of islands in V k into subsets V 1 k , V 2 k , . . . , V r k , where two islands belong to the same subset if and only if they can be made equivalent by translation about the lattice. We will refer to the quantity as the degeneracy of the semicanonical representation V k , where m is the number of islands contained in V k , and a 1 , . . . , a r are the number of islands belonging to the subsets V 1 k , V 2 k , . . . , V r k described above. We now consider the meaning of n(V k ) in equation (3.7). n(V k ) is the number of ways to divide the d 2 points of the torus T d into r where permutation of any points belonging to the same subset is does affect n(V k ). The sets V 1 k , V 2 k , . . . , V r k are exactly those defined in the previous paragraph, whereas the set V 0 k can be thought of as the set of d 2 -m 'unoccupied' points of the lattice. Equation (3.7) therefore counts the number of ways to assign islands from a semicanonical representation to the lattice, but in such a way that the islands only cover a single point of the lattice while retaining information on their shape. The number where Γ (E k ) is the set of all distinct semicanonical representations available to equivalence class E k , can be interpreted as a degeneracy factor of the case where the islands only cover a single point of the lattice. Considering the actual degeneracy factor n(E k ) in equation (3.2), we in general have that n(E k ) < n 0 (E k ). However, in appendix A.4, we show that for any equivalence class E k , as d → ∞ with N (the number of blocks) fixed. Following step (b) mentioned above, we therefore employ the low-coverage approximation n(E k ) ≈ n 0 (E k ), (3.10) in order to perform ECS in the following sections. The low-coverage approximation is mathematically permissible for the following reason. Let Z and Z 0 be the MH Markov chains when sampling from the actual distribution υ in equation (3.2) and when using the low-coverage approximation, respectively. Without loss, we can take Z and Z 0 to be υand υ 0 (E k ) = n 0 (E k )μ(E k )-distributed random variables, respectively. Equation  2) as a function of Z n , the state of the MH chain at step n, for a typical simulation of ECS at 100 K (see text for details). The red line corresponds to a n , the acceptance rates up to step n. Plot (a) shows the first 10 5 steps of the simulation, and plot (b) shows the entire 10 6 steps of the simulation. Table 1. Parameters used in the simulations described in §4. T 1 0 , T 2 0 and T step 0 , respectively, refer to the lowest temperature, highest temperature and temperature step used for parallel tempering.  might be regarded as a kind of 'two-dimensional lattice gas' model. This very natural physical picture further supports the application of the low-coverage approximation in ECS. Note that study of MCMC Markov chains with approximate transition kernels appears to related to research on the 'pseudo marginal' MCMC algorithm [29][30][31][32].

Computational implementation
We have written a code called IslandPrediction which performs ECS on the CBA model as described above. To enhance convergence rates, the IslandPrediction code incorporates parallel tempering into the MH algorithm, exactly as described in reference [33]. The replica MH chains differ in the temperature of their stationary distribution, and each MH Markov chain has proposals generated according to its own extension-reduction chain, where these extension-reduction chains are simulated independently of one another. IslandPrediction is available as an R script [34,35] and as a program written in Go [36] (both language codes available upon request). The Go version of IslandPrediction is intended for multicore environments. A striking feature of IslandPrediction is that the MH Markov chain converges relatively quickly with the length of the extension-reduction chain. Using the conditions reported in table 1, we observe burn-in periods of around 20 000-30 000 steps of the MH algorithm and an asymptotic acceptance rate of between 0.1 and 0.5 (figure 4). Reasonable estimates of quantities such as island size distribution can be obtained after 100 000 steps of the MH algorithm, assuming that the burn-in period has been neglected. The simulations presented in the following section were actually run for 10 6 steps, with the first 5 × 10 5 steps removed before analysis. Results acquired with the shorter simulation criterion of only 100 000 steps described above showed no significant difference from the longer simulations

Simulation results
We now present results obtained using the ECS procedure with the conditions shown in table 1. The block orientations of any interesting islands were optimized individually using a standard technique used to study Ising models (see appendix A.1). Figure 5a shows randomly chosen equivalence classes that appeared beyond the burn-in regime in our calculations at 100 and 150 K. At low temperatures, we find that equivalence classes almost always contain a single island, whereas at higher temperatures, these equivalence classes instead consist of multiple, smaller islands. This observation is quantified in figure 5b, which shows the island size distributions computed from the above simulations. The island size distribution f k is the probability of an island containing k blocks appearing in an equivalence class chosen randomly with respect to equation (3.2). At high temperatures, the island size distribution f k decays rapidly with the number of blocks k in the island. A similar decay was reported for the graphene size distribution obtained in the experiment described in fig. 1 [4], which suggests that in these experiments, graphene was produced under conditions in which many small islands are present.
The behaviour of the model is most interesting at low temperatures. As temperature decreases, the distribution spreads towards larger island sizes. At about 120 K, a second sharp mode starts to appear at k = 10 blocks. The occurrence of this mode shows that, at low temperatures, once every block gathers into a single island, the blocks tend to remain in this state. In order to move away from this state, one of the blocks in this island must break away, resulting in a state containing two islands. One of these islands contains nine blocks, and one contains one block. From a physical point of view, this second island contributes some entropy to the model; however, it also increases the overall energy of the model. This energy cost is particularly important at low temperature, and so this state either quickly reverts back to being a single large island, or another block departs from the island of size nine and joins the island containing one block. The appearance of two modes is therefore attributed to the fact that, once one block detaches from a large island, other blocks can be expected to detach in order to stabilize this block. As temperature decreases further, the blocks become increasingly stable when they are all part of a single island, and the probability density becomes concentrated on islands of size 10. Figure 5c shows the alignment distribution estimated from the above simulations. This corresponds to the expected fraction of P-, S-or T-type alignments observed in a randomly chosen equivalence class. At very low temperatures, almost all alignments are of the P-type. This quantifies the observation that large, chain-shaped islands tend to appear at low temperature, in agreement with the experimental observations described in figure 1. As temperature increases, the fraction of S-type alignments increases. This is a consequence of the entropic contribution to the model (described by the degeneracy term in equation (3.8)), which favours the appearance of islands of low symmetry.

Conclusion
Large state-space problems are rapidly becoming commonplace in most fields of science. For these problems, it is essential to have available methods that can efficiently extract 'target information' without burdening the researcher with volumes of superfluous information. This paper has presented a simple molecular self-assembly model that contains the basic elements of a large state-space problem, and developed the ECS technique for predicting a particular piece of target information (the 'kinds of islands' that appear with high probability in the model) with high precision. The key feature of ECS is that the model state space is reduced to a set of equivalence classes, where two states b 1 and b 2 belong to the same equivalence class if and only if b 1 can be transformed into b 2 under a suitably chosen map. By development of a special Markov chain on this reduced state space, we achieved efficient estimation of the target information in the model via MCMC sampling. While we have considered the special (but important) example of molecular self-assembly, we believe that the approach used in this study provides a 'line-of-thinking' also for other kinds of large state-space problems.
Two interesting mathematical problems are encountered in the development of ECS. In order to generate proposals for the MCMC sampling component of ECS (assuming that the MH algorithm is employed), one needs a Markov chain with state space H and calculable transition probabilities, where H is the collection of equivalence classes (reduced state space). Moreover, good estimates of the quantities n(E k ), the number of states contained in equivalence class E k , are needed to ensure that the MCMC algorithm samples from a distribution that is sufficiently close to the actual target distribution of the model. In this paper, we presented the extension-reduction chain as one example of such a Markov chain. However, there is motivation to develop other Markov chains and to study the convergence properties of the algorithms in detail. In particular, the types of Markov chains available for ECS are expected to depend upon the specific problem under consideration. We also presented an approximation for n(E k ) in the case of low-coverage of molecules on the surface, and showed that this approximation converges as desired in the limit of zero surface coverage. While many experimental studies of molecular self-assembly use low-coverage conditions, it may also be interesting to develop other approximations to the degeneracies n(E k ) that hold in the high-coverage limit. While not so relevant for the generation of small molecular islands, experiments in the high-coverage limit are widely used to generate so-called two-dimensional crystals via molecular self-assembly [24]. The high-coverage regime is also of academic interest as it often involves relatively unintuitive self-assembly behaviour [9]. Another open problem for implementing ECS algorithms concerns the development of efficient algorithms for detecting rotational isomorphism between islands and configurations of self-assembly models. This computation is a bottleneck in our current implementation of ECS. Perhaps, the greatest merit of studying molecular self-assembly via ECS is that the routes to improvement are very clear. We will explore them in detail in future research.
Data accessibility. The equivalence class sample used to generate figure 5 is available as an .Rdata workspace file for R. Upon loading this data into the R environment, one should type 'Message' to receive an explanation of the data structure. This data can be freely downloaded at http://dx.doi.org/10.5061/dryad.vd156. From statistical thermodynamics, we therefore obtain the identity where the 'full' free energy A(E k ) is defined as In this free energy, the energetic component arises from the interaction between neighbouring blocks for each state b in equivalence class E k , and the entropic component is a sum of two sources of entropy: an entropy arising from the multiple ways of choosing the orientations for the blocks, and an entropy arising from the multiple ways of placing the blocks on the lattice to achieve a state from equivalence class E k .
In the view of equation (A 3), ECS chooses equivalence classes with respect to a free energy. This is in contrast with sampling the configurations directly from (2.1), in which configurations are chosen with respect to their energy.

A.2. MCMC for sampling of block orientation distribution
For an equivalence class E with canonical representation C, consider an island I from C. Let O be the orientation space for the blocks in I. For convenience, denote the possible two orientations of the blocks as α and β, respectively. A Markov chain Y on O can be generated as follows. We generate a sequence of random variables Bi 1 , Bi 2 , . . . , Bi m from a binomial distribution with n I trials and a probability p of success per trial, where n I is the number of blocks in island I. Then, the mth step of the Markov chain, Y m is generated by choosing Bi m blocks at random and assigning them the orientation β, and assigning the orientation α to the other blocks. An MH algorithm using Y to generate proposals can then performed to analyse the distribution of the block orientations in the island I.
For all cases in this study, the MH algorithm appeared to converge within tens of seconds for each island analysed. Markov chains were typically run for 3000 steps with p = 0.5. The block orientations from O that appeared most often in the MH output were then identified as the optimum choice of block orientations for the island.

A.3. Theorems and proofs
Proof of theorem 3.1. Let us temporarily denote the two possible orientations for each block as α and β, respectively. We first consider the case where the states b 1 and b 2 each contain one island I 1 and I 2 , respectively. If b 1 and b 2 are in the same equivalence class, then I 1 and I 2 are rotational isomorphs, and according to condition RI.1 of the main text there is a bijection v that maps the labels of the blocks in I 1 to the labels of the blocks in I 2 . Let m be the number of times that the island I 1  According to condition RI.2 of the main text, there is a one-to-one correspondence between the islands in (I 1 , I 2 , . . . , I n ) and (J 1 , J 2 , . . . , J n ). Without loss, suppose that this correspondence is of the form (I 1 , J 1 ), (I 2 , J 2 ), . . . , and (I n , J n ). Following the steps from the first part of the proof shows that μ(I k ) = μ(I k ) for all k, which completes the proof.
Proof of theorem 3.2. Y is a Markov chain, because the event that Y n = E k is depends upon Y n−1 but not on Y n−m , for n ≥ m > 1. Moreover, by the definition of q k→j we have that P(Y 2 = E j |Y 1 = E k ) = q k→j . We proceed as follows to prove that Y is υ-irreducible. Let E u denote the equivalence class with a canonical representation where the island A 1 , which is isomorphic to a single block, appears N times in (A 13). Now consider any arbitrary equivalence class E k with a canonical representation where the islands in C k are ordered such that |I 1 | ≥ |I 2 | ≥ · · · ≥ |I m |, where |I j | is the number of blocks in island I j . Consider a transformation E k → E j where are the canonical representations of E k and E j and |J 1 | = |I 1 | − 1. To describe this transformation, we say that island I 1 is reduced to J 1 and that the empty island ∅ is extended to A 1 . This transformation occurs with probability, where P k I 1 is the probability that island I 1 of equivalence class k is reduced during the extension-reduction transformation, and Q k ∅,I 1 is the probability that island ∅ is extended given that I 1 is reduced during the extension-reduction transformation. These quantities correspond to equations (3.4) and (3.5) of the main text, respectively. We also have that the reverse transformation E j → E k occurs with probability Applying this argument repeatedly shows that there is a finite sequence of extension-reduction transformations from an arbitrary equivalence class E k to E u with finite probability. We also have that there is a sequence of extension-reduction transformations from an arbitrary equivalence class E h to E u with finite probability Putting these arguments together shows that an arbitrary equivalence class E k can be transformed into another arbitrary equivalence class E h via a sequence of extension-reduction transformation with finite probability. This confirms the υ-irreducibility of the Markov chain Y.
To prove the aperiodicity of Y, consider an equivalence class E k with canonical representation C k = {I 1 , I 2 , . . . , A 1 , ∅}. By extending the empty island ∅ and reducing the island A 1 , we can achieve the transformation E k → E k . The event occurs with probability which shows that gcd(n − 1: P(Y n = E k |Y 1 = E k ) > 0) = 1. Aperiodicity for any arbitrary equivalence class then follows from the irreducibility of the Markov chain.

Remark.
A formula for the probability q k→j is provided in appendix A.5.

A.4. Proof of equation (3.9)
Let T d be a torus as constructed in §3.3 and fix an equivalence class E k . Suppose that E k contains m islands. Let x and y be any distinct choice of m points from X d + 1 and Y d + 1 , respectively. We call the set x × y a grid. We consider two families of grids, F 0 and F 1 . Grids from F 0 are generated without any restrictions on what points may be chosen from X d + 1 and Y d + 1 . The number of grids contained in F 0 is equal to The set x × y is therefore a grid. F 1 is defined as the union of all grids which can be constructed in this manner, where the number H is identical for all such constructions. Letting P m be the number of ways to choose m sets of H consecutive integers from the set {1, 2, . . . , d}, the number of grids contained in F 1 is therefore This proof involves three steps.
Step 1 involves computing the number P m in equation (A 22). In step 2, we show that s 1 /s 0 → 1 as d → ∞ with N (number of blocks) fixed. In step 3, we show that for any equivalence class E k containing m islands, s 1 /s 0 → 1 implies that n(E k )/n 0 (E k ) → 1, where n(E k ) is the degeneracy factor for equivalence class E k , and n 0 (E k ) is defined in equation (3.8). Step 1. Let J 1 , J 2 , . . . , J m be m mutually exclusive sets of H consecutive integers from the set {1, 2, . . . , d}. To compute P m , we adopt the convention that J 1 < J 2 < · · · < J m , where J a < J b means that there is no integer in J a which is greater than or equal to any of the integers in J b . Let Step 2. It is sufficient to show that as d → ∞, where B = mH and A = m -1. Using the identities we can write where C = d -m(H -1) -1. Equation (A 34) was obtained with the diagonal summation identity (p. 104 of [37]). Now, for integer R > m, we have Step 3. Let I 1 , I 2 , . . . , I m denote the islands contained in canonical representation C k for equivalence class E k . For each 1 ≤ k ≤ m, choose one block t k from I k . The blocks t 1 , t 2 , . . . , t m are called tagged blocks. A state belonging to equivalence class E k can be generated in the following way. Choose a grid g from F 0 , and then choose m points from g. Then, place the islands I 1 , I 2 , . . . , I m on the torus T d such that (a) the coordinates of the m tagged blocks have a one-to-one and onto correspondence with the m points chosen from g and (b) no two islands overlap. If condition (b) cannot be fulfilled, then we choose another grid from F 0 and start again. We call this procedure T. We can also define anther procedure T 0 , which differs from T only in that condition (b) is not enforced. We can also define yet another procedure T 1 , which is identical to T 0 except that the family F 0 is replaced with the family F 1 . Now, any state in E k can be generated via T. The number of unique states (up to rotational isomorphism) that procedure T can yield is therefore equal to the degeneracy factor n(E k ). Similarly, the number of unique states that procedure T 0 can yield is equal to the quantity n 0 (E k ) in equation (3.8) of the main text. Let n 1 (E k ) be the number of unique states that procedure T 1 can yield. For sufficiently large H and sufficiently large d > mH, we have It is sufficient to show that s 1 /s 0 → 1 implies n 1 (E k )/n 0 (E k ) → 1. However, from the fact that F 1 ⊂ F 0 , we can write the identity s 1 /s 0 → 1 then implies that F 0 \F 1 → ∅ and hence F 0 → F 1 as d → ∞. However, according to procedures T 0 and T 1 , the numbers n 1 (E k ) only differ n 0 (E k ) when the families F 1 and F 0 contain different grids. It therefore follows that n 1 (E k )/n 0 (E k ) → 1 and hence that n(E k )/n 0 (E k ) → 1 as d → ∞.

A.5. Formulae for the probability q j→k
In this section, we present a method to compute the probability q j→k that the extension-reduction transformation will transform equivalence class E k into equivalence class E j . Our calculation uses a reaction representation for the extension-reduction transformation E k → E j . Let R kj = C k ∪ C j , where C k is a canonical representation of equivalence class E k . Then, remove islands from R kj until R kj contains no pairs of rotational isomorphic islands, and denote the resulting set by R kj . Label the islands in R kj as 1, 2, . . . , and s. For the ith element of R kj , we define the numbers By convention, we set the sth element of R kj to be the empty island. This implies that n s = m s = 1 for all extension-reduction transformations. The labels 1, 2, . . . , s correspond to unique elements of R kj . The reaction characteristic for the transformation E k → E j is the difference between the product and reactant sets, By convention, the island that gains a block is represented by the upper transformation A → C, and the island that loses a block is represented by the lower transformation B → D. We say that A is extended to C and B is reduced to D during this transformation. In analysing extension-reduction transformations with the composition {−1, −1, +1, +1}, we will also have to consider the case where A is reduced and B is extended, as well as the case where C is swapped for D.
It is therefore possible to characterize a particular extension-reduction transformation with the composition of its reaction characteristic, and we can speak of the composition of an extension-reduction transformation without ambiguity. The following theorem lists all possible compositions of the extensionreduction transformation.  a If B is reduced to the empty island, the B must contain one block only, which incorrectly implies that A = ∅. b Extension of the empty island to B implies that B contains one block only. Reduction of B to D then incorrectly implies that D is an empty island. c D = ∅ and D = A incorrectly implies that A = ∅. d Reduction of B to the empty island, and extension of the empty island to C imply that both B and C contain a single block, and hence that B = C. But this violates the condition M = 1.
Theorem. There are exactly nine compositions for the extension-reduction transformation, as listed in table 2.
Proof. Without reference to any particular composition, consider a general reaction representation of the type Unless explicitly mentioned, we will assume that none of A, B, C or D are empty islands. Let    Formulae for q k→j can be easily deduced by analysing each possible composition of the reaction characteristic case by case. For example, suppose that the reaction characteristic for extension-reduction transformation E k → E j has composition {−1, −1, +1, +1}. Tables 3-5 from the proof of theorem A.4 then show that this composition corresponds uniquely to the reaction representation (A 46) The islands A, B, C and D can be identified by comparing the sets RCT k→j and PDT k→j . The corresponding probability for the transformation E k → E j is q k→j = n A n B (q where n A is the coefficient from RCT k→j which corresponds to island A, q k→j BA,DC is the probability that island B is reduced, island A is extended, island D is chosen from the reduction set of B, and island C is chosen from the extension set of A. According to §2 of the main text (particularly steps T.1-T.4 of the extension-reduction transformation), we have where P k B is the probability of reducing island B (equation (3.4) of the main text), Q k AB is the probability of extending island A following reduction of island B (equation (3.5) of the main text), I − B is the reduction set of B, and I + A is the extension set of A. The notation D ∈ I − B means that there is an island contained in I − B that is rotationally isomorphic to D. The transition probabilities for every case in table 2 are listed  in table 6.