Designing the optimal bit: balancing energetic cost, speed and reliability

We consider the challenge of operating a reliable bit that can be rapidly erased. We find that both erasing and reliability times are non-monotonic in the underlying friction, leading to a trade-off between erasing speed and bit reliability. Fast erasure is possible at the expense of low reliability at moderate friction, and high reliability comes at the expense of slow erasure in the underdamped and overdamped limits. Within a given class of bit parameters and control strategies, we define ‘optimal’ designs of bits that meet the desired reliability and erasing time requirements with the lowest operational work cost. We find that optimal designs always saturate the bound on the erasing time requirement, but can exceed the required reliability time if critically damped. The non-trivial geometry of the reliability and erasing time scales allows us to exclude large regions of parameter space as suboptimal. We find that optimal designs are either critically damped or close to critical damping under the erasing procedure.


Introduction
Certain information-processing operations such as erasing a bit or copying the state of one bit into another previously randomized bit have fundamental lower bounds on work input [1][2][3][4][5]. These lower bounds such as the famous k B T ln 2 minimal cost for erasing arise due to equilibrium thermodynamics: there is a need to compensate for any entropy reduction in the informationcarrying system with an entropy increase elsewhere. Practical devices, however, do not approach these bounds [6,7] and insights gained from thinking about the lower bound have not yet translated into more energy-efficient technology. A partial explanation is that man-made devices and biological cells need to operate on fast time scales and hence cannot involve the quasistatic manipulations necessary to reach lower bounds [8,9]. An alternative suggestion from von Neumann is that the need to store information for long periods of time (reliability) leads to highcost architectures [10]. We explore the interplay between reliability, speed and the energetic cost of bit operation. Equilibrium thermodynamic bounds such as the Landauer limit cannot account for these inherently kinetic phenomena.
This general question of how to design fast, cheap and reliable bits has obvious technological relevance to the optimal design of low-power computational devices [11][12][13]. Additionally, since the discovery of the structure of DNA and the central dogma of molecular biology, it has become well accepted that information processing is at the heart of many natural phenomena. Many authors have explored information processing in biological systems, to both understand natural examples and design synthetic analogues [3,9,[14][15][16][17][18][19]. The question of the interplay between reliability, speed and cost are also relevant here, although under-explored.
In this paper, we explore the challenge of building fast, cheap and reliable bits, and provide a framework for its analysis in terms of reliability and erasure time scales. We also take the first steps towards exploring the physics of the optimal design problem by considering a simple model: a particle in a one-dimensional potential, which is a quartic double-well potential in the device's 'resting' state. We require that the bit be reliable, so that a particle equilibrated in either well stays in that well for a specified long time on average. Simultaneously, we require the implementation of an 'erase' or 'reset' operation using an external control, so that erasure is completed within a specified short amount of time. Our principal question is to find values for the design parameters which consist of the height of the double well, the friction coefficient, and the control parameters to guarantee these requirements without expending more energy than required. Our main contribution is an exploration of this design space, which demonstrates the previously under-appreciated role of friction. In particular, we identify a 'Goldilocks zone' where the friction coefficient takes moderate values. This is somewhat counterintuitive because historically friction has been viewed as a nuisance to computing, to be sent as low as possible [20][21][22][23].
In §2, we describe the model which will provide intuition for our work. We formalize the time scale over which the bit stores information through the notion of reliability time. In §2b(i), we describe one simple family of control protocols for resetting a bit. We calculate the work done in erasing a bit for this form of control. We will use this particular control protocol to illustrate our subsequent ideas. In §2b(ii), we introduce the notion of erasing time.
In §3, we consolidate from the literature the analytical forms and approximations for our two time scales of interest, and confirm them with numerical simulations. We find that both the reliability and erasing time scales are non-monotonic, roughly U-shaped functions of the friction coefficient. It follows that high reliability is obtained by setting the friction to a low or high value, whereas a low erasing time is favoured by an intermediate value of friction, implying a conflict between the two time scales for a given class of protocols. In §4, we investigate how this conflict feeds into the geometry of optimal bits: bits that fulfil the desired reliability and erasing time requirements with the minimum energy cost. We find and partially characterize a 'Goldilocks zone' in design space where optimal bits reside. In §5, we discuss the robustness of our results when more freedom is allowed in the choice of design parameters and the control protocol.

The double-well bit
We will represent a device that can store one bit of information by a particle in a symmetric bistable potential U A,B (x) = A(x 2 /B 2 − 1) 2 , where A is the height of the well and ±B are the coordinates of the minima of the right and left wells. We will refer to the device as a whole as 'a bit'. The device reports '0' when the particle is in the left well, i.e. x < 0, and reports '1' otherwise (figure 1a). The dynamics of the particle is described by the Langevin equation where m is the mass of the particle, x is the position, p is the momentum, γ is the friction coefficient of the medium, U A,B (x) is the potential, k B is Boltzmann's constant and T is the temperature of the heat bath. The term 2mγ k B T dW represents the effect of noise from the surroundings.
The Hamiltonian of the system is H( is approached as the system relaxes to equilibrium. Convergence to π (x, p) happens exponentially fast at a rate given by the first non-zero eigenvalue of the generator L [26].

(a) Reliability
A device to store information should be able to store it with high fidelity for a specified long period of time. We introduce the reliability time to represent the time scale over which our device can store data. Specifically, we define the reliability time τ r as the expected first passage time for the particle to cross the barrier of the resting-state potential of the bit, given the Gibbs distribution π (x, p) (equation (2.3)) as the initial distribution. That is, where the expectation is over trajectories (x(t), p(t)) distributed as specified by equation (2.1) from the initial condition (x(0), p(0)) ∼ law π (x, p). Note that τ r is also the first passage time to cross the barrier for a bit prepared with a Gibbs distribution, but confined to either the left-hand well π 0 (x, p) or the right-hand well π 1 (x, p), Intuitively, once a typical particle has had enough time to reach the top of the barrier, the data stored are no longer reliable.

(b) Setting information
A device intended to store information must provide functionality to load, or set, this information into the device. Setting information is a two-bit operation. A common use case is when a reference bit and the bit to be set are initially at some arbitrary values. We require that after the SET operation the reference bit is unchanged, whereas the bit to be set now holds a copy  Figure 1. A bit as represented by a particle in a one-dimensional potential. (a) The bit in its resting state, with a barrier of height 'A' separating particle locations that correspond to bit values of 0 or 1. (b) A control potential as in example 2.1 is applied to erase the stored data.
of the reference bit. This is the operation that Szilard [1] refers to as 'copying' (by contrast, Landauer [2,27] chooses to reserve the word 'copying' for the operation where the bit to be set is initially already known to be in the state '0').
Note that in the operation of setting information, or copying in the sense of Szilard, initially the two bits are uncorrelated and unknown, whereas after the operation they are still unknown but correlated. Thus implementing this operation requires decreasing the entropy of the system. Since it is easier to study a one-bit system rather than a two-bit system, we will investigate a one-bit proxy for the task of decreasing the entropy of the system, which is the task of erasing a bit.
Erasing involves taking a device whose initial state is maximally unknown into a known reference state, usually '0'. Somewhat counter-intuitively, given the name, erasing increases the information we know about the system. What is erased is not information but randomness. It helps to keep in mind the example of erasing a blackboard where some random state with chalk marks is reset to the 'all clear' state.

(i) Erasing
The example that follows describes a simple family of control potentials to implement the erasing operation for our device, which will form the basis of our analysis. One control potential from this family is illustrated in figure 1b. We chose such a simple class of controls to make a full understanding feasible, setting a framework for analysing more complex protocols. We also note that arbitrary variation of a physical potential in reality is highly non-trivial; experimental studies in which complex time-dependent potentials have been applied in fact use highly dissipated mechanisms to generate 'effective' potentials [28,29]. Example 2.1. Our control potentials are described by a single parameter F ∈ R >0 as follows: The Langevin equation in the presence of a control is Note that the control potential, as defined, is not differentiable at the boundary of the region in which it is non-zero. In practice, we assume that ∂ x V F changes rapidly but continuously in a small vicinity around these points.
In this work, we will consider variation of A, F and γ at fixed m, B and T, respectively. In this case, m specifies the natural mass scale, B the natural length scale and k B T the natural energy scale; the natural time scale is then mB 2 /k B T. Henceforth, all numerical quantities will be reported using reduced units defined with respect to these natural scales, although m, B and k B T will be retained within formulae.

(ii) Operational view of erasing
The speed of bit operations is of practical importance: a useful bit must be reliable on much larger time scales than those required to set or switch it. The control is switched on at time 0 and switched off at an appropriately chosen time τ . The time τ is chosen beforehand, and does not depend on details of individual trajectories-a trajectory-dependent control would require measurement and feedback that itself would need accounting for [30][31][32][33][34][35]. We could declare erasing as completed and switch off the control as soon as a majority of the trajectories are expected to be in the left well. However, many of these 'erased' bits would have high energies compared with typical bits drawn from the equilibrium distribution in the left well, π 0 (x, p). Thus, they could rapidly return to the right well after a very short stay in the left well. So we insist on a more stringent condition. We require that the time τ should be large enough so that the majority of bits are in the target well, with an expected next passage time close to the reliability time.
One way to guarantee that the next passage time is high is by insisting on mixing, in the sense that the initial distribution π (x, p) comes close to a distribution of particles thermalized in the lefthand well, π 0 (x, p). If this happens, we can guarantee that the expected next passage time will be equal to, or close to, the expected first passage time. However, we found this criterion too stringent for the following reason. At the end of the erasing protocol, it is not necessary that the distribution is close to π 0 (x, p)-only that the particles tend to relax to this distribution much faster than they cross back into the right-hand well, and thus they have barrier passage times representative of particles initialized with π 0 (x, p). Nonetheless, we show in the electronic supplementary material, §2.1, that using such a criterion preserves the qualitative features reported below (in particular, the scaling of the erasure time with friction in the high and low friction limits). Instead, we define an erasure region in well '0' as all points (x, p) with total energy H(x, p) ≤ A − 3k B T, where A is the barrier height. We look for the average first passage time to reach the erasure region for particles initiated in well '1' and take this quantity to be representative of the erasing time scale. The choice of the 3k B T criterion is somewhat arbitrary, but has been used before by Vega et al. [36] to study atom-surface diffusion. As we show in the electronic supplementary material, §2.2, using 4k B T makes no qualitative difference to our conclusions. This metric has the merit that it provides a clear computable criterion for erasing. Below, we demonstrate that particles within the 3k B T erasure region do indeed have expected next passage times close to the reliability time, as required.
For a range of well parameters, we used the Langevin A algorithm from [37] (see the electronic supplementary material, §1, for integrator set-up and validation) to estimate τ (x, p), the average barrier crossing time for particles initialized at position x with momentum p in the left well, for a grid of points (x, p). The average reliability time for a given well can be approximated in terms of τ (x, p) as follows: The deviation δ(x, p) := |1 − τ (x, p)/τ r | for every point (x, p) in the grid is plotted in figure 2, for a range of friction parameters at well height A = 7. It is clear that, for all values of friction, the points with total energy H(x, p) ≤ A − 3k B T have reliability times close to τ r . The same is true of other well heights A. This is because such particles typically undergo thermal mixing before they can escape the well. Once mixed, their next escape over the barrier will be on a time scale of the order of τ r . Despite the robustness of this result to the value of the friction, the heatmaps in figure 2 are friction dependent. When γ is low, the particle diffuses very slowly in energy space, and it is the challenge of diffusing within this energy space that prohibits escape from the well. As a result the heatmap corresponding to γ = 0.1 (figure 2a) follows the shape of constant energy contours. As friction starts increasing (e.g. in figure 2b,c), diffusion in momentum space becomes more rapid, but diffusion in position space slows down. Once γ becomes very high (e.g. γ = 100 in figure 2d), the behaviour of the heatmap is essentially determined by the initial position of the particle; those close to the barrier and with U A,B (x) sufficiently close to A can escape easily, but the momentum is irrelevant. Using the total energy H(x, p) as a criterion ensures that we account for all the regimes of friction.
Since we are interested in the typical time scale of transferring particles to a different well from the existent well, we will sample initial points only from the right well. We define the erasing time τ e as the expected time to hit the erasure region, given that the particle started in the right-hand well: where (x(t), p(t)) is the solution to equation (2.7) with the initial condition (x(0), p(0)) ∼ law π 1 (x, p). Given this definition, τ e indicates a typical time scale over which the control must be applied to successfully erase a large fraction of the bits. In practice, the control would be applied for a period τ > τ e to achieve high accuracy. We will use τ e as an indicative time scale of the control operation for the purposes of our analysis. It is useful to decompose the erasing time τ e as the sum of two times: the transport time and the mixing time.
-Transport time (τ t ). The time taken by the particle to reach well '0' given that it is initially distributed according to π 1 (x, p), -Mixing time (τ m ). The time taken by the particle to mix sufficiently inside the well. This is the time starting from when the particle first reaches well '0' to when it first hits the erasure region, τ m = τ e − τ t . (2.11)

(iii) Cost of erasing
In this section, we calculate the work done in erasing a bit. From Sekimoto's expression [38,39], for a protocol applied for a time τ and with a region of effect where p(x, t) dx dt is the probability that the particle is between position (x, x + dx) in the time interval (t, t + dt). There are two potential sources of work that appear in our calculation.
1. When we begin the erasure protocol by switching on the control to lift the particle. 2. At the end of the protocol when we switch off the control.
We note that, in our family of controls, there is negligible energy recovered when the control is switched off (see electronic supplementary material, §3.2), since the probability of the particle being in the region in which the control is applied is small. More generally, the question of whether energy might be recovered from small systems and stored efficiently is a complex one, despite the optimism shown in previous discussions of erasing. Indeed, current technology does not attempt to recover any energy from bits.
We now calculate the work done for our protocol (example 2.1). The particle's initial potential energy is approximately k B T/2 on average, due to the equipartition theorem, and after the control is switched the average potential energy is A + F · B for a particle in the right well, since the particle is localized around x = B, and still k B T/2 for a particle in the left well. So, ignoring energy recovery at the end of the operation, the net work done for the erasure protocol is W = (A + F · B − k B T/2)/2. As justified analytically and numerically in the electronic supplementary material, §3.1, this approximation is accurate for the values of A and F that we consider, and we will use this as the form of work for the rest of the manuscript.

Observation 2.2.
Work is an increasing function of well height A at fixed F and γ . This follows immediately from the expression of work

Friction-based trade-offs for reliability and erasing
We explore the behaviour of the reliability and erasing time scales as functions of the friction coefficient. We find that both these time scales are non-monotonic, roughly U-shaped functions of the friction coefficient. A high reliability time requirement is favoured by a very low or very high friction, whereas a low erasing time requirement is helped by the choice of a moderate value of friction. Since a bit designer would seek reliable bits (needing high or low friction) that can be erased fast (needing intermediate friction), this yields a friction-based trade-off between reliability and speed of erasure.     et al. [42] gave formulae that interpolate accurately over all values of friction (see the review in [43]). We will apply the result of Mel'nikov and Meshkov to estimate analytical forms of the escape rate for our bistable system Here, ω b is the angular frequency at barrier height, ω 0 is the angular frequency at the bottom of the well and I(A) is the action for barrier height A (see the electronic supplementary material, §5, for a detailed definition of these parameters and calculations for our system). We plot the analytical prediction of 1/k given by equation (3.1) in figure 4 for two values of well height A, as a function of friction γ . This prediction is compared with average first passage time for particles to reach the top of the barrier from an initial Boltzmann distribution within a single well. The two quantities differ at large γ because Kramers' definition does not treat a particle that crosses the barrier but then immediately crosses back as having 'escaped', whereas our definition of reliability in terms of a first passage time treats such particles as no longer being reliable. In the underdamped regime, immediate recrossings are rare and hence τ r and 1/k coincide; in the overdamped regime, particles that reach the barrier top have a 50% chance of returning and so τ r = 1/2k. As can be seen from figure 4, τ r smoothly interpolates between 1/k and 1/2k, with the small numerical factor providing only a minor correction to the underlying physics of the analytical expression in equation (3.1).
The Mel'nikov-Meshkov expression predicts an almost-exponential scaling of 1/k with barrier height A, which is reproduced by τ r and expected from the Arrhenius rate law [44]. Note that both 1/k and τ r are non-monotonic in friction γ , with long reliability times in the underdamped and overdamped limits. This behaviour results from the need for particles to diffuse in both position and energy in order to reach the top of the barrier from an initial state thermalized within a single well. At high friction, particles rapidly sample different kinetic energies due to strong coupling with the environment, but move slowly in position space and hence take a long time to cross the barrier. At low friction, particles can move rapidly but their energy remains effectively constant over short time periods. They only cross the barrier when they have eventually gained enough total energy. Intermediate friction, when neither process is excessively slow, gives the shortest τ r . This behaviour is typical of equilibrating systems in which an initial out-of-equilibrium condition (particles are guaranteed to be on one side of the well and not on the other) relaxes towards an equilibrium state (particles on both sides of the barrier), and is thus insensitive to the details of our bit design.
A more detailed analysis of the dependence of the reliability time on various parameters, and indeed the functional form of the well, is possible. However, these details are not necessary for the conclusions we draw in the rest of this manuscript, and hence we omit them here.

(b) Erasing time
As noted earlier, the erasing time is composed of two parts: the transport time defined in equation (2.10) and the mixing time defined in equation (2.11). We now present analytical estimates of these times and compare them with numerical solutions.

(i) Transport time
We can obtain an analytical estimate of the transport time in low-and high-friction limits by assuming that a particle starting at x = B moves deterministically under the influence of the potential slope and drag force.
1. Low-friction regime. The particle travels with a constant acceleration of F m and the time taken to travel a distance B is τ t ≈ √ 2mB/F. 2. High-friction regime. In this regime, we assume that the net force on the particle (arising from the sum of drag and potential) is zero. The particle travels with a velocity of F/mγ , and the time taken to travel a distance B is τ t ≈ mBγ /F. We thus expect the transport time to be constant in the underdamped regime and increase linearly with friction in the overdamped regime. Figure 5 illustrates that this scaling is observed in Langevin simulations, and that numerical values are in reasonable agreement with these crude estimates. The largest quantitative deviations occur at low force and low friction (e.g. F = 1 in figure 5a), when the diffusion of the particle on the slope contributes significantly to τ t . This results in a simulation transport time larger than the analytical estimate.  1. Low-friction regime. For the purposes of an approximate calculation we treat the well '0' as a harmonic oscillator. Deterministically, the energy of a harmonic oscillator decays exponentially in the underdamped regime. Therefore, we have E(t) = E 0 e −γ t , where E 0 is the initial energy of the particle when it first reaches x = 0 and E(t) is the energy of the particle at time t. In the underdamped regime, a particle starting at B arrives at position 2. High-friction regime. A sensible estimate of the behaviour can be obtained by explicitly modelling the diffusion of the particle near the barrier top. In the overdamped limit, the criterion of reaching a total energy of E(τ mix ) = A − 3k B T is equivalent to reaching a point d which has potential energy of A − 3k B T, since momenta are sampled arbitrarily rapidly in this limit. To proceed, we consider the typical time required to reach an absorbing barrier at d starting from x = 0, assuming a sufficiently large F that we can treat x = 0 as a reflecting barrier. Starting from the overdamped stochastic differential equation we apply the standard methods outlined in Pavliotis [25, (7.1), p. 239], which leads to the following system of equations for the average mixing time τ mix (x) as a function of the initial position x: We can solve equation (3.4) using appropriate limits to get where we have approximated the potential near the barrier as an inverted harmonic oscillator to estimate d = 3k B T/2AB. Repeating this approximation within the integral, we obtain Equations (3.2) and (3.6) predict that the mixing time will scale as 1/γ in the low-friction limit and as γ in the high-friction limit. In the first case, mixing within the well is limited by the rate at which the particle can reduce its total energy, whereas, in the second, it is determined by the speed with which the particle can diffuse in position space to a configuration with lower potential energy. We plot simulation results for the mixing time, along with the analytic predictions, in figure 6, confirming this scaling and the resultant non-monotonicity. Quantitatively, simulation results deviate from the crude analytic predictions at low force (e.g. F = 1 in figure 6a), when it is no longer reasonable to treat x = 0 as either a reflecting barrier or a steep side of a harmonic well. Instead, excursions of the particle back onto the slope occupying the region x > 0 lead to much larger mixing times. Nonetheless, the scaling and non-monotonicity in friction are preserved. Combining τ trans and τ mix gives τ e , plotted in figure 7. Analytically, the erasing time is given as: 1. Low-friction regime: 2. High-friction regime: Like reliability, erasing time is large in the underdamped and overdamped limits, and minimized at intermediate values of friction. The physical cause is the same as before; our erasing protocol involves setting the system into a non-equilibrium state, and waiting for the system  to relax towards an equilibrium in the perturbed potential. This process requires the system to diffuse in energy space and also explore configuration space, and is therefore favoured by intermediate friction. Specifically, if the friction is too low, the particle oscillates and slowly loses energy to be confined within the desired well. If the friction is too high, both the transport and mixing times increase as the particle's movement through space is so slow. The relative importance of these effects can be seen in figure 8. We note that the value of the damping γ that minimizes τ e is quite sensitive to F ( figure 8). Fundamentally, a larger F means the challenge of moving in position space is made easier, and a greater loss of energy is needed to reach equilibrium. Therefore, a higher friction coefficient is optimal. As with the reliability time, further analysis is possible but not necessary for the conclusions we wish to draw. Once again, the key point is the trade-off between high and low friction, which is not specific to our control. Indeed, it is likely to be quite generic since any protocol will necessarily push the system out of equilibrium, and will require particles to be typically confined within the target well before the control is removed.
Both erasing and reliability times exhibit a trade-off in friction, being minimized by intermediate values. This fact sets up a second trade-off between designing bits with extreme values of friction to optimize reliability, or moderate values of friction to optimize erasing. The consequences of this secondary trade-off will be explored in §4.

(iii) Additional dependencies of the erasing time
A larger value of A implies a steeper descent into the target left-hand well, making mixing faster. We therefore expect that the mixing time, and hence the erasing time, monotonically decreases with A. By contrast, erasing time shows a non-monotonic dependence on F at fixed A, γ . Applying too little force leads to slow transport, and does not effectively trap the particle within the target well. But applying too much force supplies the particle with too much energy, which must subsequently be lost during the mixing period. The fact that erasing time monotonically decreases with A at fixed F and γ , and shows a non-monotonic dependence on F at fixed A and γ , leads to a non-monotonic dependence of τ e on F at fixed W = A + F and γ . We illustrate this nonmonotonicity in figure 10, in which simple regression formulae have been fitted to the simulation data to enable interpolation at fixed W and γ (see electronic supplementary material, §4). As friction increases, the force required to provide the particle with excess energy increases, leading to minima at higher values of F.  We make the following observation, which will be used in the subsequent section.

Observation 3.2.
We have found no evidence of multiple local minima of erasing time in a level set of work for our control family (see the electronic supplementary material, §6, for characterstic plots showing the minima of erasing time in a level set of work). Physically, this is unsurprising as the non-monotonicity in τ e with γ and F mentioned above arises from fairly simple trade-offs, producing curves with single minima.
As with the reliability time, a more detailed analysis of the dependence of τ e on other parameters, and even the shape of the control, is possible. However, these details are likely to be difficult to generalize, and are not necessary for the conclusions we draw in the subsequent sections.

Design of bits
We are now ready to study the question of how to design good bits. A design involves choosing parameters A, F, γ for a bit to satisfy requirement specifications in terms of speed of erasing and reliability, without expending more work than required. The most general formulation of our problem would require us to also allow the length scale B, the temperature T and the mass m to vary, as well as allowing arbitrary controls. Such a formulation would appear to make the problem even more challenging, so it seems prudent to restrict our first analysis to the variables A, F and γ . Our restricted analysis is not without value since the underlying technology in any given construction typically does not allow arbitrary variation. Our numerical analysis with example 2.1 will guide us in our assumptions and analyses, but our results will hold in greater generality. We will construct our proofs based on general assumptions, and subsequently explain how these assumptions are met by our control family.
We introduce the following terms.  A , F , γ ). More informally, a trapped design has the lowest erasing time within a level set of work; a trapped design is unique if it is the only design within that level set of work to have the minimal erasing time; and a locally trapped design has the minimal erasing time within a local neighbourhood of designs of equal work. 7. A requirement specification (t r , t e ) is unsaturated iff there exists a (t r , t e )-optimal design (A, F, γ ) such that either τ r (A, F, γ ) > t r or τ e (A, F, γ ) < t e . A feasible requirement specification that is not unsaturated is called saturated.
Throughout this section, we will assume that τ e , τ r and W are continuous functions. We will state the main results related to the properties of the optimal design leaving the detailed proofs to the electronic supplementary material. We first claim that an optimal design always saturates the bound on the erasing time constraint. Further, if the optimal bit is not locally trapped, then it also saturates the bound on the reliability time constraint. to reduce work; but reliability time does not depend on the control parameters.) Fix requirement specifications (t r , t e ) ∈ RS. Suppose (A, F, γ ) is a (t r , t e )-optimal design. Then 1. τ e (A, F, γ ) = t e . 2. If the design (A, F, γ ) is not locally trapped, then τ r (A, F, γ ) = t r .
The next claim provides insight into the geometry of optimal designs. In particular, it states that under mild assumptions the requirement space is divided into two regions by a boundary given by the reliability and erasing times of trapped designs. Requirements with t r < t r and t e = t e , where (t r , t e ) is a requirement on the dividing line, are unsaturated, while other requirement specifications are saturated.

Claim 4.2 (saturated and unsaturated requirements).
Assume that the erasing time of trapped designs is a strictly decreasing function of the work (see the electronic supplementary material, observation 7.1, for a justification), and that as before it is always possible to decrease work at a fixed reliability time. Let (A * , F * , γ * ) be a trapped design such that τ e (A * , F * , γ * ) = t e .
The claims about saturation/unsaturation of time scales can also be proved using Karush-Kuhn-Tucker (KKT) conditions (see the electronic supplementary material, §7), a standard tool from optimization theory.
A more intuitive picture of the results can be understood from figure 11. In this figure, we illustrate how finding an optimal design subject to a specification maps a point in the requirement space to a point in the design space. For a trapped design (A * , F * , γ * ), requirements with t r < τ r (A * , F * , γ * ) and t e = τ e (A * , F * , γ * ) are unsaturated and get mapped to the same design (A * , F * , γ * ) (claims 4.2(2) and 4.2(1)). If the design (A * , F * , γ * ) is uniquely trapped, then requirements with t r ≥ τ r (A * , F * , γ * ) and t e = τ e (A * , F * , γ * ) are saturated (claim 4.2(3)). Figure 12 illustrates these results for our example family of controls (example 2.1). As discussed in the electronic supplementary material, §4, we have implemented simple regression to fit the functions τ e (.) and τ r (.) to our simulation results. We then identified trapped designs using numerical minimization, plotting the requirement specifications saturated by these designs. For each trapped bit (A * , F * , γ * ), we randomly selected requirements with t e = τ e (A * , F * , γ * ), but with t r either greater than, equal to or less than τ r (A * , F * , γ * ), and used numerical optimization techniques to search for the optimal designs. The results support our analysis; requirements with t r < τ r (A * , F * , γ * ) are unsaturated, and those with t r ≥ τ r (A * , F * , γ * ) are saturated. Furthermore, as we show in figure 12b, unsaturated requirements at fixed t e all map to the same trapped design.

(a) Optimal friction for simple controls
In §3, we demonstrated that both reliability and erasing times are non-monotonic in friction, with short erasing times favoured by moderate values of friction, and long reliability times favoured by extreme values. In what follows, we give a precise quantification of the resultant trade-off in finding the friction of an optimal bit. The analysis is significantly simplified for our family of controls, in which work is independent of the friction coefficient.   Figure 13. Regions of friction space can be eliminated from the search for optimal bits for our class of controls. As a result, the optimal friction either is critical damping or it lies somewhere within the two regions of moderate friction. Illustrative curves of τ e and τ r at fixed A, F indicate these regions.
We call the design (A, F, γ e crit ) critically damped. 2. γ r crit is the friction coefficient that minimizes the reliability time as a function of friction coefficient γ at fixed A and F, i.e. for all γ ∈ R >0 , we have It is easy to note that trapped bits are also critically damped. In figure 13, we show illustrative curves of the erasing and reliability times as a function of friction coefficient γ at fixed A, F. These curves have single minima at γ e crit and γ r crit , respectively. Also shown on these graphs are regions of friction space that can be eliminated from consideration for optimal bits. To eliminate extreme values of friction, we note that the design must have a minimal finite A to be a well-defined twostate system in the resting state. For our bit, it is A min ≈ 3. In the next claim, we precisely describe which regions of friction can be eliminated.

Claim 4.3 (forbidden regions for optimal friction).
Assume that both τ e and τ r have a single, well-defined minimum and tend to infinity as γ tends to zero or infinity. Let (A, F, γ ) be a (t r , t e )optimal design (see figure 13 for notational convenience).

Proof.
1. We prove it for the case when γ e crit > γ r crit ; the other case proceeds in identical fashion. For contradiction, assume that γ ∈ (γ 0 , γ e crit ). Then  (b) (a) Figure 14. Optimal friction either is critical damping or lies within a small region adjacent to critical damping, for our family of controls. We plot friction for optimal designs (A, F, γ ) against erasing time requirements (t e ) for a fixed value of reliability time requirement (t r ), alongside γ e crit and γ r crit . Note that A and F are not fixed, but determined by the optimization procedure alongside the optimal friction for each requirement (t r , t e ). The data were obtained from numerical optimization and minimization based on regression fits to simulation data. (a) Reliability time requirement (t r ) = 500 and (b) reliability time requirement (t r ) = 10 000. τ e and τ r , and the fact that τ r tends to infinity as γ tends to zero or infinity, there exists a design (A, F, γ ) with γ > γ 0 and τ r (A, F, γ ) = τ r (A, F, γ ) ≥ t r , but τ e (A, F, γ ) < τ e (A, F, γ ) ≤ t e . The design (A, F, γ ) is (t r , t e )-optimal since it is (t r , t e )-feasible and has W(A, F, γ ) = W(A, F, γ ), contradicting lemma 4.1(1) that the optimal bit saturates the bound on the erasing time constraint. 2. For contradiction, suppose that γ < γ 1 or γ > γ 2 . Then since A ≥ A min and the reliability time increases with well height and more extreme values of γ , either τ r (A, F, γ ) ≥ τ r (A min , F, γ ) > τ r (A min , F, γ 1 ) = t r or τ r (A, F, γ ) ≥ τ r (A min , F, γ ) > τ r (A min , F, γ 2 ) = t r , contradicting claim 4.1(2) that an optimal design that is not locally trapped saturates the bound on the reliability time constraint.
For clarity, let us assume initially that γ e crit > γ r crit (equivalent arguments hold for the alternative). We see that optimal designs either reside at γ e crit or lie within two regions at moderate friction, as illustrated in figure 13. Interestingly, one region is adjacent to γ e crit , whereas the other is not. It is not easy to see how designs in one region (γ 1 ≤ γ ≤ γ 0 ) as in figure 13 can outperform those in the other region (γ e crit < γ ≤ γ 2 ). Indeed, when we performed numerical optimization on the regression-based fits to our simulation data, we only observed optimal bits that are either critically damped or lie in the allowed region adjacent to critical damping. This is illustrated in figure 14, where we plot the optimal friction as a function of erasing time requirement at fixed reliability time requirement, for two values of reliability time requirements. We also plot γ e crit and γ r crit for comparison. At low erasing time requirements, designs reside at γ e crit . At slightly higher erasing time requirements, the designs become saturated and the optimal friction lies adjacent to γ e crit in the region γ e crit < γ ≤ γ 2 . Eventually, γ e crit crosses γ r crit . At the crossing point, we have γ = γ e crit = γ r crit . At higher values of erasing time requirements, γ still occupies the region adjacent to γ e crit , which is now γ 1 ≤ γ ≤ γ e crit < γ r crit .

Conclusion
We have explored the question of the design of optimal bits. Previously, authors have focused on designing optimal protocols that minimize work input when implementing a finite-time operation on a given system [8,[45][46][47][48]. Our approach differs in considering that bits need to have two distinct functionalities: retain data for long periods of time and allow rapid switching or erasing. Moreover, we consider optimizing over system parameters such as the intrinsic friction as well as the external control. Our fundamental observation is that friction plays a non-trivial role in the design of bits. Both switching/erasing and the eventual degradation of data involve relaxation towards equilibrium from a non-equilibrium distribution. This process is fastest at intermediate values of the friction, but slow in the overdamped and underdamped regimes. The best bit designs have high reliability times and low switching/erasing times, which implies an inherent trade-off in bit design between extreme values of friction that favour high reliability, and moderate values of friction that favour rapid switching or erasing. We have explored the consequences of the biphasic role of friction for a simple class of controls. The existence of non-trivial minima of erasing time in the level set of work leads to the generation of trapped designs. These designs are optimal for reliability requirements smaller than their own reliability time leading to unsaturated requirements. The result of the trade-off between extreme values of friction that maximize reliability time and moderate values of friction that minimize erasing times is that optimal designs are either critically damped or occupy a region of moderate friction close to critical damping.
Our work opens up a new perspective on the design of efficient computational devices showing that: the best designs are likely to be neither underdamped nor overdamped. This observation is particularly important as some authors have considered friction to be inherently problematic for computation [20][21][22][23]. Equally, the role of friction is suppressed when bits are modelled as discrete two-state systems [2,9,49], since this approximation assumes rapid equilibration within the discrete states.
We have only considered a simple family of controls to motivate our analysis and illustrate our findings. This family is not optimal-it was chosen for its simplicity and ease of analysis. Moreover, there is some arbitrariness in the definition of both the erasing and reliability times. As such, the numerical details of the results obtained are not very important. We are not claiming to have derived numerical corrections to the minimal cost of erasing a bit, for example, or the specific work costs (substantially larger than k B T ln 2) which are not that informative. Rather, it is the qualitative results, which hold for a much broader class of controls, that are important. The non-monotonic role of friction in both the erasing and reliability time scales is a generic physical phenomenon that extends beyond the details of our implementation, and implies a competition between the goals of fast manipulation and long reliability times. Relatively weak assumptions-that it is always possible to decrease work at fixed reliability time and that the minimal erasing time decreases with increased work-imply that erasing time requirements are always saturated by optimal bits and that trapped designs lead to unsaturated reliability time requirements, respectively. Other results rely more on the simplicity of the control family: the existence of only one local minimum of erasing time at fixed work simplifies the question of whether a requirement specification is saturated. The fact that work is independent of friction simplifies the task of eliminating certain values of friction as suboptimal.
Explicit exploration of a broader class of controls, including those with more complex variation over time, and varying parameters such as particle mass and distance between wells, are possible directions for future work. It is not immediately clear whether minima in erasing time at fixed work cost will become more or less prominent features of the optimization landscape when the complexity of the system is increased, for example. In particular, raising or lowering the barrier between metastable states is a common idea [8,9,28,29]. Lowering the barrier during erasing potentially allows for faster erasing at fixed reliability time and lower work cost. If said barriers could be raised and lowered arbitrarily far and quickly, it may be possible to circumvent any conflict between high reliability and low erasure time. However, real physical systems are not generally this flexible. Indeed, in order to apply a complex time-dependent control to a small colloid, experimenters typically use optical feedback traps [28,29], which are not true potentials and rely on the continuous input of energy to apply forces and perform feedback control. For true physical protocols that permit finite raising and lowering of barriers between metastable states, we expect that our findings would still apply to a family of protocols with optimal barrier manipulation. An alternative direction would be to consider similar effects in systems with inherently quantum mechanical behaviour. Competing interests. We declare we have no competing interests. Funding. A.D. is supported by the ROTH scholarship of the Department of Mathematics, Imperial College London. T.E.O. is funded by a University Research Fellowship from the Royal Society.