# **Asynchronous Reversible Computing Unveiled Using Ballistic Shift Registers**

K.D. Osborn<sup>1,2,3[,\\*](#page-0-0)</sup> and W. Wustmann<sup>1</sup>

1 *The Laboratory for Physical Sciences at the University of Maryland College Park, Maryland 20740, USA* 2 *The Quantum Materials Center, University of Maryland, College Park, Maryland 20742, USA* 3 *The Joint Quantum Institute, University of Maryland, College Park, Maryland 20742, USA*

(Received 17 March 2022; revised 31 January 2023; accepted 24 March 2023; published 10 May 2023)

Reversible logic can provide lower switching-energy costs relative to all irreversible logic, including those developed by industry in semiconductor circuits; however, more research is needed to understand what is possible. Superconducting logic, an exemplary platform for both irreversible and reversible logic, uses flux quanta to represent bits and the reversible implementation may switch state with low energy dissipation relative to the energy of a flux quantum. Here, we simulate reversible shift register gates that are ballistic: their operation is powered by the input bits alone. A storage loop is added relative to previous gates as a key innovation, which bestows an asynchronous property to the gate such that input bits can arrive at different times as long as their order is clearly preserved. The shift register represents bit states by flux polarity, both in the stored bit as well as the ballistic input and output bits. Its operation consists of the elastic swapping of flux between the stored and the moving bit. This is related to a famous irreversible shift register, developed prior to the advent of superconducting flux quanta logic (which used irreversible gates). In the base design of our ballistic shift register (BSR), there is one input port and one output port but we find that we can make other asynchronous ballistic gates by extension. For example, we show that two BSRs operate in sequence without power added, giving a 2-bit sequential memory. We also show that a BSR with two input and two output ports allows a bit state to be set from one input port and then conveyed out on either output port. The gate constitutes the first asynchronous reversible 2-input gate. Finally, for a better insight into the dynamics, we introduce a collective-coordinate model. We find that the gate can be described as motion in two coordinates subject to a potential determined by the input bit and initial stored flux quantum. Aside from the favorable asynchronous feature, the gate is considered practical in the context of energy efficiency, parameter margins, logical depth, and speed.

DOI: [10.1103/PhysRevApplied.19.054034](http://dx.doi.org/10.1103/PhysRevApplied.19.054034)

### **I. INTRODUCTION**

Modern superconducting digital logic  $[1,2]$  $[1,2]$  uses singleflux quanta (SFQs) as bits and has been developed through research programs since the 1980s [\[3\]](#page-17-0). Its circuits are built from inductors and resistively shunted Josephson junctions. Today, superconducting logic is primarily demonstrated in three types: rapid single-flux quantum (RSFQ) descendants [\[3–](#page-17-0)[7\]](#page-17-1), adiabatic quantum flux parametron (AQFP) logic [\[8](#page-17-2)[,9\]](#page-17-3), and reciprocal quantum logic (RQL) [\[10\]](#page-17-4). The logic gates use conditional activation of SFQ over potential energy barriers with an energy cost  $\gg k_B T$ . The energy cost is bound by a minimum,  $\ln(2)k_BT$  per bit, due to irreversible loss of information in logically irreversible gate operations [\[11\]](#page-17-5).

Arguably, the earliest influential SFQ work [\[12\]](#page-17-6) is a proposal for a "Flux Shuttle" by the Bell Laboratories team of Anderson, Dynes, and Fulton [\[13](#page-17-7)[,14\]](#page-17-8). The device shows that a SFQ can be advanced within the circuit from one storage location to the next by an external current pulse; since the SFQ can represent a bit state, the shuttle demonstrates a precursor to a shift register. This is similar to a shift register in RSFQ [\[15](#page-17-9)[,16\]](#page-17-10) but in RSFQ the shift forward is caused by the arrival of another SFQ (a clock SFQ) rather than an external current pulse. Both structures are not thermodynamically reversible due to the activated dynamics, which rely on resistive elements for damping. This damping is needed in irreversible logic to allow the circuit to quickly reach a new steady state.

Here, we introduce and study ballistic shift registers (BSRs). Due to their reversible design (time-reversal symmetry), they can in principle incur an energy cost <  $ln(2)k_BT$  per bit operation. The devices operate without shunt resistors and a key feature is that the inertia of the input bits solely powers an operation that leaves the device near a new steady state after the operation. Unlike the "asymmetric" bit-state encoding in RSFQ, based on

<span id="page-0-0"></span><sup>\*</sup>corresponding author: osborn@lps.umd.edu (alternatively, kosborn@umd.edu)

the presence versus the absence of SFQ, BSRs use the two degenerate flux states (polarities) to represent the bit states, both for the moving bit (at input and output) and the stored bit. Since the data SFQ travels freely and the gate is unpowered, it is considered "ballistic." In the gate, the moving SFQ interacts with a SFQ stored by an inductor. Nonlinear dynamics allow an input bit to transform the stored state to the input bit state and also to carry away the input bit energy as an output SFQ with the previously stored bit state. This constitutes a shift register operation from the forward-scattering dynamics of a SFQ. To accomplish the near-ideal reversible dynamics, the gates utilize capacitive shunts in the Josephson junctions (JJs) within the gate circuit.

The BSR gates expand upon previous work [\[17](#page-17-11)[–19\]](#page-17-12) on a reversible logic named reversible-fluxon logic (RFL), which uses ballistic SFQs moving along long Josephson junctions (LJJs), where the SFQs are spatially extended and are referred to as fluxons. Ballistic RFL gates always use LJJs in pairs named bit lines, that carry ballistic bits into and out of the gate. In that work, the ballistic gates for a controlled-NOT (CNOT) [\[18\]](#page-17-13) and NOT SWAP (NSWAP) [\[17\]](#page-17-11) require synchronized input bits. RFL is distinct from the reversible circuits of the parametric quantron [\[20\]](#page-17-14), the negative-mutual-inductance superconducting quantum interference device (nSQUID) [\[21](#page-17-15)[–23\]](#page-17-16) and AQFP [\[24\]](#page-17-17), because those circuits adiabatically apply power from a clock to execute the gate operation. In contrast, in ballistic RFL gates the bits use no clock reference. The RFL CNOT is not fully ballistic and uses clock SFQ.

BSR gates, as a new development of RFL, are asynchronous. Asynchronous ballistic gates require a stored state such that the ballistic bit can interact with it  $[25,26]$  $[25,26]$ . By definition, asynchronous reversible logic allows an arbitrary delay time between input bits, as long as it exceeds a minimum delay time. Asynchronous logic thus avoids the requirement of synchronized bits. Moreover, it has the potential for the sequential operation of multiple gates without clocking and this provides an architectural advantage over clocked gates, such as typical gates in RSFQ. In this work, we show simulated operations of the first asynchronous reversible gates, a sequential BSR and a BSR with multiple input ports. The latter gate allows one to write a bit state on one bit line and then transfer it to a second bit line, which is a (different) 2-bit shift operation.

As the BSR operation is unpowered, it relies on the free-scattering dynamics of the input fluxon. The dynamics depend on the difference between the input and stored bit states, such that there are two dynamical types. We choose a combination of dynamical types that are favorable in terms of parameter margins. If the input and stored bit states differ, the dynamics are similar to a 1-bit NOT gate, which uses a resonance for polarity inversion [\[17\]](#page-17-11). If the input and stored bit states are equal, the dynamics are of transmission type, which is simpler than a previously studied ID resonance gate [\[17\]](#page-17-11). This combination improves the parameter margins compared with the 2-bit synchronous IDSN gate  $[18]$ , which combines the NOT and ID dynamical types.

This paper is organized as follows. In Sec. [II,](#page-1-0) we present a 1-input BSR, analyze its steady states, and discuss the operations of the BSR for both a 1-bit and multibit serialshift register. A 2-input BSR is introduced in Secs. [II E](#page-7-0) and IIF gives an overview of the operation margins. In Sec.  $\Pi G$ , we investigate the fluxon delay times and give estimates for the timing uncertainty (jitter), induced either by the gate or by thermal fluctuations. In Sec. [III,](#page-10-0) we analyze the BSR dynamics by means of a collective coordinate (CC) model, which is shown to quantitatively describe the BSR dynamics and to help with the interpretation of the gate dynamics. The discussion in Sec. [IV](#page-13-0) includes technical findings on energy efficiency, speed, energy-delay product, and logical depth.

# <span id="page-1-0"></span>**II. BSR CIRCUIT AND OPERATION**

The Anderson-Dynes-Fulton (ADF) flux shuttle is an important historical step in the development of SFQ digital electronics, because it interprets SFQ as bits and it is closely related to RSFQ shift registers even though it predates RSFQ by approximately a decade [\[12\]](#page-17-6). In the ADF flux shuttle, the SFQ can be localized in a potential well generated by device geometry or magnetic fields and can be moved forward to the next potential well by current pulses [\[14\]](#page-17-8).

In contrast, RSFQ shift registers [\[15](#page-17-9)[,16\]](#page-17-10) use dc current bias and SFQ clock signals to forward the bits. As is standard in RSFQ, a data SFQ represents the logic 1-state, while the 0-state is represented by an absence of flux at the same position. In these shift registers, a clock SFQ arriving near a data-storage cell causes a JJ in the storage cell to switch phase by  $2\pi$  if that cell contains a data SFQ. In this case, the data SFQ will be shifted to an adjacent cell. The clock SFQ will progress past the storage cell, regardless of the presence of a data SFQ in it.

In the RSFQ shift registers, and RSFQ gates generally [\[4\]](#page-17-20), the power to move SFQ comes from current biases. As the JJs in RSFQ circuits are critically damped and biased with a bias current  $I_{\text{bias}} \lesssim I_c$  near their critical current  $I_c$ , the  $2\pi$  phase switching is accompanied by an energy dissipation of  $\lesssim I_c\Phi_0$ , where  $\Phi_0$  is the flux quantum. This dissipated energy is of the same order of magnitude as the energy of the logic 1-state. For context, note that a fluxon bit in an LJJ can be related to the SFQ energy in a typical digital cell: the typical bit energy for a SFQ is approximately  $J_c d^2 \Phi_0$  with JJ critical current density  $J_c$  and JJ diameter  $d$ . The energy of a fluxon in a long Josephson junction is approximately  $J_c w \lambda_J \Phi_0$  for a long junction with width *w* (the "short" dimension) and Josephson penetration depth  $\lambda_J$ , where the latter determines the fluxon length.

RFL represents the bit states 0 and 1 by the two possible polarities  $\sigma = \pm 1$  of a fluxon, corresponding to the sign of its flux  $\pm \Phi_0$  and denoted as fluxon (+) and antifluxon (−). Switching between the degenerate bit states (polarity inversion) and other logic operations may be achieved in ballistic gates [\[17,](#page-17-11)[18](#page-17-13)[,27\]](#page-17-21), which are undriven and solely powered by the energy of the input fluxon(s). Previous ballistic reversible gates of RFL have been designed without internal state memory, implying that the operation of a multibit gate requires synchronous input bits. Although specialized store-and-launch gates [\[18,](#page-17-13)[19\]](#page-17-12) have been designed for the purpose of synchronization (and routing), others have advocated for "asynchronous" ballistic reversible gates [\[25,](#page-17-18)[26\]](#page-17-19).

Asynchronous multibit gates have the advantage that the timing of the input bits no longer needs to be precise. They merely have to arrive with a minimum delay time between them to allow quiescence before the next arrival. Asynchronous ballistic gates require an internal state, by which an interaction between subsequent input bits is mediated. In order to generate output that depends on the input of a previous scattering, the internal state has to be changeable by an input bit and has to determine the output state of the ballistic scattering.

The storage cell of the ballistic shift register (BSR) provides exactly such functionality, in that it can store a bit state in form of a flux quantum with positive or negative flux orientation  $S = \pm 1$ . As we describe below, the stored state *S* may change during the scattering dynamics, depending on the input fluxon state  $\sigma$ . The scattering type (output fluxon state) in turn depends on the current value of *S* but is independent of the input timing, regardless of what input ports are used.

## **A. Circuit**

The most basic BSR is the 2-port circuit shown in Fig. [1\(a\).](#page-2-0) It consists of a storage cell, one LJJ each for input and output, and a special interface cell between them. The LJJs form a part of the gate and also serve as fluxon inand output channels. The side arms of the interface cell are formed by two JJs with parameters  $(\hat{C}_I, \hat{I}_c)$ . Each of them terminates one of the LJJs and thus they are referred to as "termination JJs." The upper rails of the LJJs are joined by a negligible inductance (on the upper side of the interface cell), while the lower rails are connected by the so-called "rail JJ" (of the interface cell), with parameters  $(C_J^B, I_c^B)$ . The rail JJ is also part of the storage cell, which is closed by a parallel inductor *Ls*. Given suitable parameters  $(2\pi L_s I_c^B/\Phi_0 \gtrsim 5)$ , the storage cell can store one bit of information in the form of a steady circulating current. A clockwise (counterclockwise) circulating current

<span id="page-2-0"></span>

FIG. 1. (a) The circuit schematic for a 1-input BSR, consisting of one input and one output LJJ (three cells shown for each) connected by a circuit interface, which is made from three capacitance-shunted JJs, where two of these are left and right "termination JJs" with  $(\hat{C}_J, \hat{I}_c)$  and the third is the "rail JJ" with  $(C_J^B, I_c^B)$ . An inductor  $L_s$  in parallel with the rail JJ is made to store one SFQ, where a clockwise (counterclockwise) circulating current *I<sub>s</sub>* corresponds to flux state  $S = 1$  ( $S = -1$ ) and rail phase  $\phi^B \approx 2\pi S$ . The parameters of the interface and storage cell are set to enable energy-efficient forward scattering from one LJJ to the other for all combinations of stored flux state  $S = \pm 1$  and input flux state  $\sigma = \pm 1$ , for a range of input velocities. External circuitry coupled inductively to *Ls* may be optionally used to assist initialization of the BSR (loading), giving  $|S| = 1$ . (b) The operation table for 1-input BSR characterized by a SWAP operation between the stored and moving flux states:  $(S', \sigma') =$  $SWAP(S, \sigma)$ . A schematic illustration of BSR operation is shown for the third row of the table, with  $S = -1$ ,  $\sigma = 1$ , and the resulting  $S' = 1$ ,  $\sigma' = -1$ . The LJJ trilayer is illustrated using gray for the superconductor and blue for the tunneling barrier. (c) The operation table for the pioneering ADF flux shuttle [\[13\]](#page-17-7), which is based on the presence of a SFQ as the bit state. A schematic illustration of the shuttle operation is shown for the third row of the table, where a fluxon settles as a static SFQ in a storage cell due to damping (shown as shunt resistors) and is subsequently released as a fluxon by application of a bias current *Ib*.

corresponds to a positive (negative) flux orientation  $S = 1$  $(S = -1)$  and rail-JJ phase  $\phi^B \approx 2\pi S$ .

Similar to the 1-bit ballistic RFL gates, the NOT or ID, the interface of the BSR is designed to enable forward scattering, i.e., starting from a fluxon entering on one LJJ and resulting in a fluxon exiting on the other LJJ. By making the in- and output LJJs sufficiently long  $(\gtrsim 10\lambda_J)$ , we ensure that the ballistic fluxons can move freely. The ballistic gates require specific parameter values in the interface to achieve the operation. The ballistic scattering involves the temporary breaking of the fluxon at the gate interface and a short oscillation of an interface mode. In previous 1-bit ballistic RFL gates, i.e., the NOT and ID, the polarity of the exiting fluxon is determined by the polarity of the incoming fluxon alone. In contrast, the output bit of the BSR is dependent on the stored bit state *S*. The ballistic scattering dynamics generates the regular operations of a shift register, summarized in the table of Fig.  $1(b)$ . One of the four possible operations of the BSR is sketched in Fig.  $1(b)$ . In comparison with the operation of the ADF flux shuttle  $[13,14]$  $[13,14]$  [see Fig. [1\(c\)\]](#page-2-0), it uses no external drive power to advance the stored SFQ from the storage cell. Instead, the incoming bit state is swapped efficiently with the stored one in a reversible process.

We refer to the BSR of Fig.  $1(a)$ , where fluxons can arrive on only one input LJJ, as a *1-input* BSR, to distinguish it from a BSR with separate write and read channels (cf. Fig. [5\)](#page-7-1). The circuit dynamics of the 1-input BSR are described by the Lagrangian

$$
\mathcal{L} = \mathcal{L}_l + \mathcal{L}_r + \mathcal{L}_I,
$$
\n
$$
\mathcal{L}_l = \frac{E_0 a}{\lambda_J} \sum_{n \ge 1} \left[ \frac{1}{2} \frac{(\dot{\phi}_n^{(l)})^2}{\omega_J^2} + \cos \phi_n^{(l)} - \frac{(\phi_{n-1}^{(l)} - \phi_n^{(l)})^2}{2(a/\lambda_J)^2} \right],
$$
\n
$$
\mathcal{L}_r = \frac{E_0 a}{\lambda_J} \sum_{n \ge 1} \left[ \frac{1}{2} \frac{(\dot{\phi}_n^{(r)})^2}{\omega_J^2} + \cos \phi_n^{(r)} - \frac{(\phi_n^{(r)} - \phi_{n-1}^{(r)})^2}{2(a/\lambda_J)^2} \right].
$$
\n(1)

Herein,  $\mathcal{L}_{lr}$  are the Lagrangian components of the left and right LJJ, respectively, and  $\mathcal{L}_I$  describes the interface which connects them. The JJs in the discrete LJJs have capacitance and critical current of  $(C_J, I_c)$  and each unit cell of length *a* has the inductance *L*. The characteristic time, length, speed, and energy scales of the LJJ  $\sqrt{2\pi I_c/(\Phi_0 C_J)}$ , the Josephson penetration depth,  $\lambda_J =$ are set by the Josephson plasma frequency,  $\omega_J = 2\pi v_J =$  $a\sqrt{\Phi_0/(2\pi L_l)}$ , the Swihart velocity,  $c = \omega_J \lambda_J$ , and the energy scale,  $E_0 = I_c \Phi_0 \lambda_J / (2\pi a)$  [a static fluxon in the LJJ has energy  $8E_0$ ; cf. Eq.  $(10)$ ].

In our design, the inductance in the interface cell is assumed to be negligible, as indicated in Fig.  $1(a)$ , and in this situation the phase of the rail JJ of the interface is not independent but fixed by

$$
\phi^B = \phi_L - \phi_R,\tag{2}
$$

$$
\phi_L := \phi_{n=0}^{(l)}
$$
 and  $\phi_R := \phi_{n=0}^{(r)}$ , (3)

where we introduce shorthand notation for the termination JJ phases in Eq.  $(3)$ . With this approximation, the interface Lagrangian corresponding to Fig.  $1(a)$  reads

<span id="page-3-2"></span>
$$
\mathcal{L}_{I} = \frac{E_{0}a}{\lambda_{J}} \left\{ \frac{1}{2} \frac{\hat{C}_{J}}{C_{J} \omega_{J}^{2}} \left[ \dot{\phi}_{L}^{2} + \dot{\phi}_{R}^{2} \right] + \frac{1}{2} \frac{C_{J}^{B}}{C_{J}} \frac{(\dot{\phi}_{L} - \dot{\phi}_{R})^{2}}{\omega_{J}^{2}} \right. \\ \left. + \frac{\hat{I}_{c}}{I_{c}} \left[ \cos \phi_{L} + \cos \phi_{R} \right] + \frac{I_{c}^{B}}{I_{c}} \cos(\phi_{L} - \phi_{R}) \right. \\ \left. - \frac{1}{2} \frac{L\lambda_{J}^{2}}{L_{s} a^{2}} (\phi_{L} - \phi_{R} + 2\pi f_{E})^{2} \right\}, \tag{4}
$$

where the parameter  $f_E$  quantifies an external flux  $f_E \Phi_0$ applied to the storage cell [cf. Fig.  $1(a)$ ]. A finite  $f_E$  may be useful during the initialization of the BSR, i.e., the initial loading of a SFQ into the storage cell, but it is not necessary in principle. During regular BSR operations, a SFQ is already stored and  $f_E$  is set to zero. In this work, we present results on regular BSR operations (where  $f_E = 0$ ) and initialization results using  $f_E = 0$ .

#### <span id="page-3-3"></span>**B. Steady states of the circuit (before input)**

<span id="page-3-1"></span>To analyze the bit-storage characteristics of the BSR, we first study the steady states of the BSR circuit (in the absence of a moving fluxon). It is helpful to first compare the BSR to the earlier ballistic gate circuit without a storage cell. Schematically, in the limit of infinite storage cell inductance  $L_s \to \infty$ , the BSR circuit is equivalent to the circuit of the ID and NOT gates [\[17\]](#page-17-11). The steady states of Eq. [\(1\)](#page-3-1) are then given by uniform phase fields in the left and right LJJ,  $\phi_n^{(l)} = 2\pi k_l$  and  $\phi_n^{(r)} = 2\pi k_R$  ( $k_{L,R} \in \mathbb{Z}$ ), while the rail phase assumes the value  $\phi^B = 2\pi (k_L - k_R)$ . Herein, the integers  $k_{L,R}$  label the "vacuum" states (ground states) of the  $\phi$ -periodic LJJ potential [\[28\]](#page-17-22). Similar to uncoupled LJJs, all configurations  $(k_L, k_R)$  are degenerate here and the dynamics are not dependent on their initial values. When a fluxon from the left LJJ is scattered forward without (with) polarity inversion, it realizes an ID (NOT) gate; it transfers the system from a state with  $(k_L, k_R)$ to the state with  $(k_L + 2\pi\sigma, k_R \pm 2\pi\sigma)$ . By the way, the dynamics of the NOT gate, but not the ID gate, are used below for the BSR.

<span id="page-3-0"></span>In the presence of finite  $L<sub>s</sub>$  in the BSR, the degeneracy of different configurations  $(k_L, k_R)$  is lifted due to the contribution  $\propto (\phi_L - \phi_R)^2/(2L_s)$  in the potential [cf. Eq. [\(4\)\]](#page-3-2). Large values of the rail phase  $\phi^B = \phi_L - \phi_R$  (and of the vacuum-level difference  $2\pi(k_L - k_R)$  to the left and right of the interface) become energetically inaccessible. At finite  $|k_L - k_R| > 0$ , while the LJJ phases far away from the interface are still confined to their respective vacuum levels  $2\pi k_{L,R}$ , the LJJ phases near the interface are perturbed. We therefore model the LJJ phases (in the absence of a fluxon) as bound states with evanescent fields of the form

$$
\phi_n^{(l)} = (\phi_L - 2\pi k_L)e^{-\mu a n} + 2\pi k_L
$$
  

$$
\phi_n^{(r)} = (\phi_R - 2\pi k_R)e^{-\mu a n} + 2\pi k_R,
$$
 (5)

where  $\mu$  is the inverse decay length. Assuming that the bound-state amplitudes  $\phi_{L,R} - 2\pi k_{L,R}$  are small, the corresponding rail phase,  $\phi^B = \phi_L - \phi_R$ , is approximated by the vacuum-level difference,  $\phi^B \approx 2\pi (k_L - k_R) = 2\pi S$ . The flux state *S* in the storage cell is determined by the difference in configuration on the left and right sides of the interface,  $S = k_L - k_R$ .

Inserting Eq.  $(5)$  in the Lagrangian  $(1)$ , the potential can be expressed, in the limit of small bound-state amplitudes, as

$$
\tilde{U}_{(k_L, k_R)} = \frac{E_0 a}{\lambda_J} \left\{ -\frac{\hat{I}_c + I_{c, \text{eff}}}{I_c} \left[ \cos \phi_L + \cos \phi_R \right] - \frac{I_c^B}{I_c} \cos(\phi_L - \phi_R) + \frac{1}{2} \frac{L \lambda_J^2}{L_s a^2} (\phi_L - \phi_R + 2\pi f_E)^2 + \frac{1}{2} \frac{L \lambda_J^2}{L_{\text{eff}} a^2} \left[ (\phi_L - 2\pi k_L)^2 + (\phi_R - 2\pi k_R)^2 \right] \right\}.
$$
\n(6)

Referenced from the interface, each LJJ contribution is reduced to an effective JJ and an effective inductance,

$$
I_{c, \text{eff}} = I_c / (e^{2\mu a} - 1), \tag{7}
$$

$$
L_{\text{eff}} = L(e^{\mu a} + 1)/(e^{\mu a} - 1),\tag{8}
$$

where both are in parallel with the corresponding termination JJ  $(\hat{I}_c)$ . In these expressions, the inverse decay length  $\mu$  of the bound state is not yet determined. However, we can estimate  $\mu$  from the condition that the bound state fulfills the dispersion relation in the LJJ bulk,  $\omega_{\text{bulk}}^2 = \omega_J^2 + \omega_{\text{bulk}}^2$  $2c^2/a^2$  (1 – cosh( $a\mu$ )). Being interested in steady states of the interface, we can set  $\omega = 0$  and obtain the estimate  $\mu = a^{-1} \cosh^{-1} (1 + a^2/(2\lambda_J^2)).$ 

The potential shown in Fig.  $2$  is obtained from Eq.  $(6)$ by choosing the energy-minimizing configuration,  $U_S$  :=  $\min_{(k_L,k_R)} (\tilde{U}_{(k_L,k_R)}),$  for each point  $(\phi_L, \phi_R)$ . The resulting diamond-shaped domains are labeled in Fig. [2](#page-4-1) by the locally minimizing  $(k_L, k_R)$ . The potential is  $4\pi$  periodic in the sum of the phases  $\phi_L + \phi_R$  at constant phase difference  $\phi_L - \phi_R$ , due to the combined  $2\pi$  periodicity in the two components. However, the potential has an approximate parabolic dependence on  $\phi^B = \phi_L - \phi_R$ . For not too large  $\phi^B$ , the potential  $U_S$  has a local minimum in each of the domains  $(k_L, k_R)$  and these steady states correspond to a stored flux state  $S = k_L - k_R$ . Degenerate global minima are found at  $\phi^B = \phi_L - \phi_R = 0$  in the domains with zero stored flux,  $(k_L - k_R) = 0$ . States with a single stored

<span id="page-4-1"></span><span id="page-4-0"></span>

<span id="page-4-2"></span>FIG. 2. The BSR-circuit potential  $U_S := \min_{(k_L, k_R)} (\tilde{U}_{(k_L, k_R)})$ in the bound-state approximation [see Eq.  $(5)$ ] as a function of the termination-JJ phases  $\phi_{L,R}$ . The potential is  $4\pi$  periodic in the phase sum  $\phi_L + \phi_R$  and has an approximate parabolic dependence on  $\phi_L - \phi_R = \phi^B$  due to energy storage in  $L_s$ . Each configuration  $(k_L, k_R)$  determines a diamond-shaped domain. For  $\phi^B \lesssim 4\pi$ , the diamonds contain a well that supports a stored flux state  $S = k_L - k_R$ . Steady states in wells with  $S = \pm 1$  (black points) have energy  $E_S = 2.5E_0 \approx 2\pi^2 L\lambda_J/(L_s a)S^2 E_0$ . Additional equipotential lines are shown at  $E_S + E_{fl}$ , for fluxon energy  $E_{\text{fl}} = 10E_0$ , indicating the  $\phi_{L,R}$  range accessible for an incident fluxon with velocity  $v = 0.6c$ . Four trajectories,  $\phi_L(t) = \phi_0^{(l)}(t)$ and  $\phi_R(t) = \phi_0^{(r)}(t)$ , are shown (red, blue, orange, and light-blue points) for different cases of stored SFQ polarity  $S = \pm 1$  and incident fluxon. These are the termination-JJ phases obtained from the circuit simulations. The solid (dashed) arrows indicate the resulting transitions to another well, for  $\sigma = 1$  ( $\sigma = -1$ ). The flux state in the new well is  $S' = \pm S$  if  $S = \pm \sigma$ . The system parameters are dimensioned such that the potential  $U<sub>S</sub>$  allows these transitions and also the transition from  $S = k_L - k_R = 0$  to  $|S| = 1$  for the initialization (SFQ loading) of the BSR. Note that  $U<sub>S</sub>$  assumes LJJ fields of the bound-state form [see Eq.  $(5)$ ] in the absence of a fluxon. The superimposed scattering trajectories are therefore not described by *US* alone. The BSR parameters and parameter ranges are given on the left-hand side of Table [I.](#page-8-2)

flux quantum are found in the domains with  $|k_L - k_R| = 1$ , with the local minima at  $\phi^B = \phi_L - \phi_R \approx 2\pi (k_L - k_R)$  $\pm 2\pi$ . From the vertical position of the minima,  $\phi_L + \phi_R =$  $2\pi(k_L + k_R)$ , it follows that the bound-state amplitudes on the left and right sides of the interface are equal and opposite, (φ*<sup>L</sup>* − 2π*kL*) + (φ*<sup>R</sup>* − 2π*kR*) = 0.

During normal operation of the BSR, the parameters satisfy  $f_E = 0$ ,  $\max(I_c^B, \hat{I}_c, I_{c, \text{eff}}) \gg \Phi_0/(2\pi L_s)$ , and  $|S| \le 1$ , where the bound-state amplitudes are small and the phase fields in the left and right LJJs are nearly uniform. For a single stored flux quantum *S* in the BSR, we can thus approximate  $(\phi_L, \phi_R)$  in Eq. [\(5\)](#page-4-0) as  $(2\pi k_L, 2\pi k_R)$ . From Eq.  $(5)$ , it follows that the stored energy relative to the

empty BSR  $(S = 0)$  is

$$
E_S \approx \frac{2\pi^2 L \lambda_J}{L_s a} (k_L - k_R)^2 E_0 = \frac{2\pi^2 L \lambda_J}{L_s a} S^2 E_0.
$$
 (9)

If the BSR is initialized with  $|S| \leq 1$ , an incoming fluxon with velocity  $v$  and energy

$$
E_{\rm fl}(v) = 8E_0 \left(1 - (v/c)^2\right)^{-1/2} \tag{10}
$$

can transfer the BSR into a new stored flux state with energy  $E'_S \le E_S + E_{\text{fl}}(v)$ . The equipotential lines in Fig. [2](#page-4-1) indicate the corresponding  $\phi_{L,R}$  range accessible from  $|S| = 1$ . Energetically, it is not possible to load a second  $SFO$  ( $|S| = 2$ ) into the storage cell, whereas transitions to other states with  $|S| = 1$  or  $S = 0$  are energetically possible.

#### **C. Fluxon-scattering dynamics**

Figure [3](#page-5-1) illustrates the BSR operation, where each of the four subfigures shows the circuit simulations for two consecutive input fluxons. In all four cases, the BSR is assumed to initially contain a stored flux quantum  $S =$ −1. This means that the circuit is initialized in a bound state of the form of Eq.  $(5)$ , with  $(k_L, k_R) = (0, 1)$  and corresponding steady state values of  $\phi_{LR}$ . The incoming fluxon(s) are treated in simulation as additional contributions to the initial phase and voltage distribution in the left LJJ, far away from the interface. An input fluxon (antifluxon), which has positive (negative) polarity  $\sigma =$  $1(-1)$ , is parametrized by the ideal phase distribution  $\phi(x, t) = 4 \arctan (\exp(-\sigma (x - vt)/W))$  with velocity v and width  $W = \lambda_J (1 - v^2/c^2)^{1/2}$ . This corresponds to a positive (negative) voltage pulse with maximum (minimum)  $\pm 2\Phi_0 \nu_J (v/c)(1 - v^2/c^2)^{-1/2}$ . We compute the fluxon dynamics from numerical integration of the  $(N_l +$  $N_r + 3$ ) classical circuit equations of motion for the  $(N_l + 1)$ *Nr*) JJs in the LJJs, together with the termination and rail JJs of the interface. The left-hand panels of Fig. [3](#page-5-1) show the resulting JJ voltages  $V_n^{(l,r)}$  at positions  $x_n = \pm a(n + \frac{1}{2})$  $1/2$ )  $\leq 0$  ( $n = 0, 1, 2, \ldots$ ) in the left and right LJJs. The position of the interface is  $x = 0$ . The right-hand panels of Fig. [3](#page-5-1) show the evolution of the rail phase  $\phi^{B}(t)$  from the initial value  $\phi^B(0) \approx -2\pi$ .

Figure  $3(a)$  shows, at the earliest times, a first fluxon with polarity  $\sigma = +1$  traveling in the left LJJ with nearly constant speed  $v_0 = 0.6c$ . As it reaches the interface, the fluxon breaks into two parts, i.e., its phase and voltage fields become discontinuous. In the process, the energy of the fluxon is coherently transferred to a localized excitation involving the left and right LJJs in form of timedependent evanescent fields. The localized excitation lasts long enough for its own brief oscillation and subsequently generates a large field profile in the right LJJ, which eventually moves as a free fluxon in the right LJJ, away from

<span id="page-5-2"></span><span id="page-5-1"></span><span id="page-5-0"></span>

FIG. 3. Simulated operations of the shift register initialized with the stored flux quantum  $S = -1$ , under four different input sequences of two fluxons:  $(\sigma_1, \sigma_2) = (1, 1), (1, -1), (-1, 1), (-1, -1).$  The left-hand panels show the dynamics of JJ voltages  $V_n$  at positions  $x_n \leq 0$ in the left (input) and right (output) LJJ, respectively. The color scale shows blue tracks for fluxons ( $\sigma = 1$ ) and orange for antifluxons ( $\sigma = -1$ ). The right-hand panels show the evolution of the rail-JJ phase  $\phi^B$  from initial state  $\phi^B \approx 2\pi S$ . In cases where  $\sigma = -S$ , the fluxon scatters forward as an output fluxon with inverted polarity,  $\sigma' = S$ , and the stored state after the scattering becomes  $S' = \sigma$ . In the other cases,  $\sigma = S$ , the fluxon is simply transmitted with a short delay and the stored state remains unchanged. All cases fulfill  $(S', \sigma') = \text{SWAP}(S, \sigma)$ , thus generating the state map of a 1-bit shift register [cf. Fig.  $1(b)$ ]. The BSR parameters, as listed in Table [I,](#page-8-2) are the same as in Fig. [2](#page-4-1) and input fluxons enter with  $v_0 = 0.6c$ .

the influence of the interface. During the whole process, the phase  $\phi^B$  of the rail JJ changes monotonously from  $\phi^B \approx -2\pi$  to approximately  $2\pi$ . This  $4\pi$  phase change indicates the simultaneous transfer of the (positive) SFQstate  $\sigma = 1$  of the input fluxon into the storage cell and of the initially stored (negative) SFQ  $S = -1$  out of the storage cell. Thus, after the scattering, the new orientation of the stored flux quantum is  $S' = 1$  and the output fluxon carries the negative SFQ state,  $\sigma' = -1$ , as indicated by the negative sign of the voltage peak. The fluxonscattering dynamics in this case are similar to those of the fundamental (1-bit) NOT gate [\[17\]](#page-17-11) (without a storage cell).

When the second input fluxon with  $\sigma = 1$  arrives (Fig. [3\(a\),](#page-5-1)  $v_J t \approx 12$ ), the input SFQ state now equals the stored SFQ state,  $S = 1$ . The resulting scattering dynamics

at the interface therefore differ significantly from those of the preceding fluxon. The interface here acts mostly as a low potential barrier by which the fluxon is slowed down temporarily while retaining its fluxon identity, with unchanged polarity. During the fluxon transmission, the rail JJ is only weakly excited away from  $\phi^B \approx 2\pi$ , indicating that no significant flux transfer occurs. Accordingly, both the state  $\sigma$  of the fluxon and the stored flux state *S* are unchanged in this process. While the result of the fluxon scattering here is the same as in the fundamental ballistic ID gate (the polarity of the outgoing fluxon is identical to that of the incoming one), the scattering dynamics are different: here, the fluxon retains its topological identity throughout the process, whereas in the fundamental ID gate, it breaks up into two partial fluxons at the interface and generates a large localized oscillation as a result (which has a longer duration compared with the temporary oscillation in the fundamental NOT gate). The difference of the transmission-type dynamics of the BSR compared with the dynamics of an actual ID gate originates from the added term  $(\phi^B)^2/(2L_s)$  in the interface potential [see Eq. [\(4\)\]](#page-3-2). It limits the  $\phi^B$  range accessible with the initial energy  $E_{\text{fl}}$  of the fluxon (see Fig.  $2$  and the discussion in Sec. [II B\)](#page-3-3). As a result, the rail phase in the BSR cannot increase from an initial value  $\phi^B \approx 2\pi$  by  $\Delta \phi^B \approx +4\pi$ . The transmissiontype dynamics create an advantage for the margins of the BSR (see Sec.  $\overline{I}$  IF) relative to the fundamental ID gate, which has somewhat sensitive margins in comparison with the fundamental NOT gate due to the longer resonant oscillation. In summary, the inductor *Ls* enables bit storage, changes the dynamics relative to previous RFL gates, and improves operation margins compared to an earlier gate.

Figure [3](#page-5-1) demonstrates that the scattering-dynamics results for all state pairs  $(S, \sigma)$  in a new state pair, which is related to the old one in the form of a SWAP operation,  $(S', \sigma')$  = SWAP(*S*,  $\sigma$ ). The ballistic scattering dynamics in the BSR circuit thus generate the state map of a 1-bit shift register [see Fig.  $1(b)$ ]. As described above, two different types of scattering dynamics are involved and in both types the SWAP happens with almost ideal efficiency  $(cf. Sec. II F).$  $(cf. Sec. II F).$  $(cf. Sec. II F).$ 

The BSR operations can be illustrated as interwell transitions in the circuit potential  $U<sub>S</sub>$ , induced by the incoming fluxon. This is illustrated in Fig. [2](#page-4-1) by the trajectories  $(\phi_L, \phi_R)(t)$  (data points), where  $\phi_{L,R}(t)$  are taken from the circuit simulation. Note that the circuit potential shown in Fig. [2](#page-4-1) assumes that the LJJ fields have the form of a bound state [see Eq.  $(5)$ ] but does not take the fluxon into account. For example, a fluxon with  $\sigma = 1$  leads to a transition (red) from the initial state in the well with  $(k_L, k_R) = (0, 1)$  to the well with  $(k_L, k_R) = (1, 0)$  and this changes the stored flux state  $S = -1 \rightarrow 1$ . In contrast, a fluxon with  $\sigma = -1$  induces an *S*-preserving transition (blue), namely, to the well with  $(k_L, k_R) = (-1, 0)$ , which is fully equivalent to the initial well. The underlying circuit dynamics for these two processes correspond to the first fluxon scattering in Figs.  $3(a)$  and  $3(c)$ , respectively. Equivalent dynamics are observed for an initial stored flux state  $S = 1$ , e.g., initially  $(k_L, k_R) = (1, 0)$ , if the polarity of the incoming fluxon is inverted at the same time. Thus, an incoming fluxon with  $\sigma = -1$  ( $\sigma = 1$ ) induces a transition that inverts (preserves) *S*, as shown by the orange (light-blue) trajectories.

The dynamics shown in Fig. [3](#page-5-1) illustrate the regular BSR operations, where initially a flux quantum is already stored in the storage cell. Without this initialization, the BSR will not perform all the intended reversible operations. To initialize a BSR, a SFQ can be loaded into the empty storage cell by sending in a fluxon on the bit line. As the interface cell is designed with minimal inductance, it cannot hold a SFQ. Therefore, once the fluxon is stopped near the interface, its flux is transferred into the storage cell. We find that it is possible to load the BSR in this way with a SFQ, using no external flux  $(f_E = 0)$  and a fluxon of nominal energy  $E_{\text{fl}} = 10E_0$ . A lower-energy fluxon ( $E_{\text{fl}} < 10E_0$ ) might be better for loading because less excess energy would have to dissipate prior to reaching a quiescent state, but due to a potential barrier, a very slow fluxon reflects from the interface instead of trapping. We find that the potential barrier can be lowered by applying an external flux  $(f_E \approx$ 0.25) and in that case a low-energy fluxon ( $E_{\text{fl}} \approx 8.2E_0$ ) is successfully loaded into the storage cell.

The above initialization procedure seems suitable for individual BSRs. Another procedure may be favorable in a large circuit with many BSR gates. A suitably designed circuit could in principle be made to trap flux solely in the BSR storage cells. To initialize many BSR gates in such a circuit, one would cool through the superconducting transition in a magnetic field and then turn off the field.

#### **D. Sequentially arranged shift registers**

A multibit shift register can be constructed from sequentially arranged BSRs, constituting a serial-in–serial-out (SISO) register. As an example, Fig. [4](#page-7-2) shows a 2-bit serial shift register and its dynamics. The two bits are stored in one of four different configurations,  $(S_1, S_2) = (-1, -1)$ ,  $(-1, 1)$ ,  $(1, -1)$ , and  $(1, 1)$ . Figures  $4(b) - 4(e)$  show the gate dynamics for each of these initial configurations and a single input fluxon, here with  $\sigma = +1$ . The two BSR are located at  $x = 0$  and  $x \approx 15\lambda_J$  (separated by  $N + 2 = 40$ LJJ cells). As in Fig. [3,](#page-5-1) the stored flux quanta can be inferred from the values of the rail-JJ phases  $\phi^{B1}$  and  $\phi^{B2}$ in the right-hand panels of each subfigure, using that  $\phi^{Bi} \approx$  $2\pi S_i$  ( $i = 1, 2$ ). The operation of the entire 2-bit shift register is powered by the energy of the input fluxon, which loses only a fraction of its kinetic energy in each of the scatterings. The numbers printed in the left-hand panels are the output-to-input velocity ratios after each scattering,  $v_1/v_0$  and  $v_2/v_1$ , where again we use initial velocity

<span id="page-7-2"></span>

FIG. 4. A 2-bit serial-in–serial-out (SISO) shift register (a) and its dynamics (b)–(e) for a single input fluxon,  $\sigma = +1$ , and four different initial configurations of the two stored bits:  $(S_1, S_2)$  =  $(-1, -1)$ ,  $(-1, 1)$ ,  $(1, -1)$ , and  $(1, 1)$ . As in Fig. [3,](#page-5-1) the lefthand panels show the JJ voltages  $V_n$  in the LJJs, where a fluxon  $(\sigma = 1)$  is seen as a blue track and an antifluxon  $(\sigma = -1)$  is seen as an orange track. The two BSRs are located at  $x_n = 0$  and at  $x_n = (N + 2)a = 40a \sim 15\lambda_J$ , respectively, although smaller distances are also possible. The right-hand panels show the evolution of the rail-JJ phases  $\phi^{Bi}$  of the two BSR ( $i = 1, 2$ ). The numbers printed in each of the left-hand panels are the relative speed  $v_1/v_0$  ( $v_2/v_1$ ) after passing through the first (second) BSR. The BSR parameters are the same as in Fig. [3.](#page-5-1)

 $v_0 = 0.6c$ , corresponding to an initial energy of  $E_f(v_0) =$ 10*E*0. The lowest velocity ratio (0.91) corresponds to 95% energy conservation according to Eq.  $(10)$ .

For each scattering type (NOT and transmission), both stages of the 2-bit shift register give approximately the same velocity ratios (approximately 0.91 for NOT and approximately 0.95 for transmission), corresponding to those of a 1-bit BSR (see Sec.  $IIF$ ). The observed small variability in forward-scattering efficiency at the first and second BSR [e.g., between 0.95 and 0.96 for the two consecutive transmissions in Fig.  $4(e)$  can be attributed to the presence of fluctuations in the connecting LJJ and at the second BSR prior to the fluxon arrival there. These fluctuations are emitted from the first BSR during the first scattering event.

Due to a relatively sharp lower cutoff velocity of the transmission dynamics of one BSR [cf. Fig.  $6(f)$ ], we believe that  $k = 2$  may be the maximum number of sequentially arranged BSRs for a realistic circuit design without added power. It is beyond the scope of the paper

to discuss how to reach a *k*-bit sequential shift register with larger *k* but we plan to address this in future work.

#### <span id="page-7-0"></span>**E. 2-input shift register**

The 1-bit BSR, shown in Fig.  $1(a)$  has one input LJJ and one output LJJ. Together, they form a single channel called a bit line for the forward-scattering fluxon. This *1 input* 1-bit BSR may be generalized to a *2-input* 1-bit BSR, where a storage cell is shared between two such scattering channels, as shown in Fig. [5.](#page-7-1) We verify in simulations that the 2-input structure also acts as an energy-efficient 1-bit BSR, using the same interface parameters as for the 1-input 1-bit BSR (cf. Table [I\)](#page-8-2). For the operation of this BSR, it is irrelevant which of the two input LJJs a fluxon is sent in on—the dynamics for both input cases are equivalent and are qualitatively equivalent to the dynamics of the 1-input BSR. In the 2-input device, the role of the rail-JJ phase  $\phi^B$ of the 1-input BSR is taken over by the phase difference of the two rail JJs,  $\phi^B \rightarrow \phi^B - \phi^C$ . Motivated by the small difference in parameter margins between the 1-input and the 2-input versions of the BSR (cf. Table [I\)](#page-8-2), we expect that a version with many inputs would also operate. This implies that a stored bit of information could be routed to one of many outputs.

Despite the shared storage cell, there is no strong dynamic coupling between the upper and lower parts of the gate. By that, we mean that even during the NOT-type scattering, an input fluxon on the upper (lower) LJJ induces a large phase change of  $4\pi$  only in the adjacent  $\phi^B$  ( $\phi^C$ ) and a relatively small temporary excitation in  $\phi^C$  ( $\phi^B$ ), resulting in an output fluxon only on the upper (lower) output LJJ. A similar observation holds for the transmission-type scattering; however, in this case no significant phase change takes

<span id="page-7-1"></span>

FIG. 5. The circuit schematic of the 2-input shift register, where the upper and lower LJJ pairs form separate fluxonscattering channels (bit lines) with a shared storage cell between them. The inductance of the storage cell is *Ls* and the interface cells of both scattering channels are symmetric. Efficient BSR operation takes place with parameter values given in Fig. [3](#page-5-1) and operation margins given on the right-hand side of Table [I.](#page-8-2)

<span id="page-8-2"></span>TABLE I. The interface parameters and margins for 1-input BSR [see Fig. [1\(a\)\]](#page-2-0) and 2-input BSR [see Fig. [5\]](#page-7-1). The margins are defined by a required output-to-input velocity ratio  $v_f/v_0 \ge 0.6$  (cf. Fig. [6\)](#page-8-3), corresponding to an energy efficiency  $E_{\rm fl}(v_f)/E_{\rm fl}(v_0) \ge 0.86$  for initial  $v_0 = 0.6c$ . This condition is met by all regular BSR operations, i.e., for any combinations of  $S = \pm 1$  and  $\sigma = \pm 1$ . In case of the 2-input BSR, it is also independent of the choice of input LJJ. In both BSR types, the parameters allow for a velocity retention up to  $v_f/v_0 = 0.91$  (cf. Fig. [6\)](#page-8-3).

| Parameter        |       | 1-input BSR                      |                                        |                          | 2-input BSR                      |                                     |                          |
|------------------|-------|----------------------------------|----------------------------------------|--------------------------|----------------------------------|-------------------------------------|--------------------------|
| $\boldsymbol{p}$ | $p_0$ | $(p_{\min} - p_0)/p_0$<br>$(\%)$ | $(p_{\text{max}} - p_0)/p_0$<br>$(\%)$ | $\Delta p/p_0$<br>$(\%)$ | $(p_{\min} - p_0)/p_0$<br>$(\%)$ | $(p_{\rm max} - p_0)/p_0$<br>$(\%)$ | $\Delta p/p_0$<br>$(\%)$ |
| $C_J^B/C_J$      | 7.5   | $-31$                            | $+61$                                  | 92                       | $-35$                            | $+56$                               | 91                       |
| $C_J/C_J$        | 5.4   | $-39$                            | $+68$                                  | 107                      | $-37$                            | $+79$                               | 116                      |
| $I_c^B/I_c$      | 4.7   | $-29$                            | $+18$                                  | 47                       | $-30$                            | $+16$                               | 46                       |
| $I_c/I_c$        | 1.6   | $-90$                            | $+16$                                  | 106                      | $-78$                            | $+20$                               | 98                       |
| $L_s/L$          | 20    | $-37$                            | $+130$                                 | 167                      | $-35$                            | $+124$                              | 159                      |

place even in the adjacent rail JJ. With this property, the 2 input BSR may be used as a SFQ memory with separate write and read lines.

### <span id="page-8-0"></span>**F. Margins**

An optimal set of circuit parameters for an energyefficient BSR is given by the first and second columns in Table [I.](#page-8-2) These parameters optimize the elastic nature of both scattering types (NOT and transmission) of the BSR operation, such that the dominant fraction of the energy of the input fluxon is conserved in the forward-scattered fluxon. The resulting output-to-input velocity ratio  $v_f/v_0$ of the optimized dynamics amounts to 0.91 and 0.95, respectively. For an input fluxon with  $v_0 = 0.6c$ , the average energy efficiency of the BSR therefore is 96% according to Eq.  $(10)$ . Figure [6](#page-8-3) shows the output-to-input velocity ratio under variations of different parameters. Setting the minimum ratio  $v_f/v_0$  to 0.6 (corresponding to an energy efficiency  $E_f(v_f)/E_f(v_0) \ge 0.86$  for initial  $v_0 = 0.6c$ ), we find the operation margins for the BSR, as shown in the next three columns in Table [I.](#page-8-2) The current limiting factor of the BSR design, as shown in Fig.  $6(f)$ , is the somewhat restricted range of input velocities for the transmissiontype BSR dynamics. In this case, the potential barriers of the interface, which are proportional to  $I_c^B$  and  $\hat{I}_c$ , impose a sharp lower operation limit of  $v_0 \geq 0.53c$ , although other parameters can be used to reduce this lower-velocity limit. Of the parameter margins,  $I_c^B$  is the smallest, with a range of 47%.

In addition to the variation of interface parameters presented in Fig. [6,](#page-8-3) we study gate robustness under variation of LJJ parameters. The resulting margins are generally favorable too, with the smallest range of 66% appearing under variations of the LJJ cell inductances *L*, as shown in Appendix [A.](#page-15-0)

The margins of the 2-input BSR, shown in the last three columns of Table [I,](#page-8-2) are almost the same as those of the 1 input BSR. This is consistent with our earlier observation that the dynamics on each (upper or lower) bit line are only very weakly affected by the presence of the other bit line. As long as there is no excitation on the extra bit line, it mainly acts as an inductance added to the storage loop. Therefore, we expect that a BSR gate with more bit lines (a *k*-input BSR) will operate similarly well.

### <span id="page-8-1"></span>**G. Asynchronous gate timing**

In most SFQ logic, including RSFQ and its descendants, the gates are clocked and the SFQ data bit is processed in gates once the clock pulse arrives. Although data bits do not need to arrive at a precise time, the gates are still considered synchronous because the data SFQs must be processed with a clock pulse.

<span id="page-8-3"></span>

FIG. 6. The margins of 1-input BSR: the output-to-input velocity (retention) ratio,  $v_f/v_0$ , (a)–(e) for  $v_0/c = 0.6$ , as a function of varied interface parameters,  $C_J^B$ ,  $\hat{C}_J$ ,  $L_s$ ,  $I_c^B$ , and  $\hat{I}_c$ , respectively; (f) for fixed interface parameters but varied initial velocity  $v_0$ . In (a)–(f), all parameters except the varied one are kept constant at values given in Table [I.](#page-8-2) The error bars mark the amplitudes of velocity oscillations (an uncertainty) after scattering. The shaded regions illustrate the ranges wherein both scattering types [NOT (red) and transmission (blue)] fulfill  $v_f/v_0 \ge 0.6$ , i.e.,  $E_{\text{fl}}(v_f)/E_{\text{fl}}(v_0) \geq 0.86$ . This condition produces the margins given in the left-hand side of Table [I.](#page-8-2)

In asynchronous reversible logic gates, the requirements on the arrival time are reduced: operations are not clocked and bits only need to arrive in a definite order. To achieve definite order, the input constraint is a negligible direct interaction between successive input bits. In the LJJ, the interaction strength decays exponentially for large distances compared to  $\lambda$ <sub>J</sub> [\[29\]](#page-17-23) and in our LJJs we find that  $10\lambda_J$  is a suitable distance for negligible interactions. For an asynchronous 1-input gate such as the 1-bit BSR, this corresponds to a minimum required delay time between two input fluxons,  $T_d > \max(10\lambda_J/v_0, \tau_{\max})$ , where  $v_0$  is the (common) input velocity and  $\tau_{\text{max}}$  is the maximum duration of all possible gate operations. To explore the role of the delay time  $T_d$ , this section investigates (i) the dependence of the BSR on  $T_d$  and (ii) vice versa, i.e., the modification of the fluxon delay *after* the BSR gate. Finally, the gate-induced timing uncertainty is compared with that arising from fluxon jitter.

Next, we describe the uncertainties created by variation of the fluxon delay time and different gate operations. In the underlying simulations, we study a 1-bit BSR initialized with  $S = -1$  and two consecutive input fluxons with a delay time  $T_d$  between them. The initialized gate is in a steady state but the first gate operation leaves residual energy at the interface. Thus the efficiency of the second operation will depend not only on its own dynamics (operation type) but also that of the preceding one. As a result, each of the four input combinations ( $\sigma_1, \sigma_2$ ) generates a distinct curve (curves shown in different colors) for the output velocity  $v_f$  of the second fluxon, as shown in Fig. [7.](#page-9-0) These curves exhibit oscillations in  $v_f$  with an approximate periodicity of  $1/\nu_J$ , due to the constructive or destructive interaction of the input fluxon with the oscillations remaining from the preceding gate operation. While Fig.  $7(a)$ shows the situation in the absence of damping, in Figs.  $7(b)$ and  $7(c)$  we show the results for damping added to the three interface JJs. These JJs are shunted by external capacitances already, unlike those in the LJJ. As an example of some damping, we test a case where one has a lossy dielectric in the capacitor such that the effective loss tangent of the JJ and shunt capacitor together are tan $\delta(\omega = \omega_J)$  =  $4 \times 10^{-3}$ . For reference, this corresponds to parallel resistances of  $\hat{R}_J = 1/(\tan \delta \omega_J \hat{C}_J)$  and  $R_J^B = 1/(\tan \delta \omega_J \hat{C}_J^B)$ , in parallel to the shunt capacitors  $\hat{C}_J$  and  $C_J^B$  in Fig. [1\(a\).](#page-2-0) Here, we choose the LJJ frequency  $\omega_J$  as the reference for the loss tangent because the interface elements temporarily oscillate with approximately that frequency during the gate operation. The added damping in Fig.  $7(b)$  give rise to slightly increased  $v_f$  variations and slightly reduce the average  $v_f$  relative to the undamped case [Fig. [7\(a\)\]](#page-9-0). We believe that these differences are caused because the BSR parameters were optimized in the absence of damping. However the same damping after longer times, as shown in Fig.  $7(c)$ , diminishes  $v_f$  variations, because the residual gate energy has been reduced significantly over time. In

<span id="page-9-0"></span>

FIG. 7. The output-to-input velocity ratio of 1-input BSR under variation of the delay time  $T_d$  of the input fluxon to a preceding input fluxon, for (a) undamped and (b),(c) weakly damped JJs in the SR interface. The BSR efficiency depends, of course, on the gate operation of the current input fluxon (polarity  $\sigma_2$ ) but also on the preceding gate operation (input polarity  $\sigma_1$  and SR initialized with  $S = -1$ ), as illustrated by the four different curves in each panel. The output velocity  $v_f$  oscillates with approximate periodicity  $1/v_J$  due to energy remaining at the SR interface after the preceding gate operation. The interface parameters are those given in Table [I,](#page-8-2) where in (b) and (c) we add shunt resistors to the three interface JJs, corresponding to a loss tangent of tanδ( $ω = ω_J$ ) = 4 × 10<sup>-3</sup>:  $\hat{R}_J = 1/(tan δ ω_J \hat{C}_J)$ and  $R_J^B = 1/(\tan \delta \omega_J C_J^B)$  [cf. Fig. [1\(a\)\]](#page-2-0). By adding circuit damping, the  $v_f$  oscillations increase slightly and the average  $v_f$  is slightly reduced: see (b) relative to (a). However, for long delay times, damping will diminish  $v_f$  variations—see (c) relative to (b)—and allow asynchronous gate operations independent of input timings.

addition, this reduces the  $v_f$  variations between operation types. This shows that the output velocities of gates can be constrained to a range using damping and a specified minimum delay time, regardless of the number of previous gate operations.

The variability of the gate output velocities and the operation-dependent gate durations gives rise to timing uncertainties for the fluxon arrival at a later gate. Take, for example, the 2-bit SISO register of Fig.  $4(a)$ , where we now consider two input fluxons arriving at the first BSR with an initial delay time  $T_d$  between them, as sketched in Fig. [8.](#page-10-1) After two bits have passed through the first BSR of the SISO, the delay time is in general

<span id="page-10-1"></span>

FIG. 8. A sketch for operation of a 2-bit serial gate, e.g., a 2-bit SISO register, for the input of two successive fluxons separated by a delay time  $T_d$ . The duration  $\tau_i$  and output velocity  $v_{f,i}$  (*j* = 1, 2) of each gate operation depend on the operation type and thus on the input state  $(\sigma_1, \sigma_2)$ . Due to different  $\tau_i$  and different  $v_f$ , the delay time at the following gate  $T_d$  will, in general, differ from  $T_d$ , depending on the distance  $\Delta x$  between the two gates.

modified, due to different duration  $\tau_i$  ( $j = 1, 2$ ) of the two operations at that BSR. The fluxon delay time also changes during their subsequent motion from the first to the second BSR, due to the different exit velocities  $v_{f, j}$  $(j = 1, 2)$  from the first BSR. The final fluxon delay time is therefore  $T'_d = T_d + \tau_2 - \tau_1 + \Delta x (v_{f,2}^{-1} - v_{f,1}^{-1}),$  where  $\Delta x = (N + 2)a$  is the length of the LJJ section between the two BSR gates. Here, the  $\tau_i$  and  $v_{f,i}$  depend on operation type and on  $T_d$ . We estimate the final delay time  $T'_d$ from the simulations of two fluxons given in Fig.  $7(c)$ . The figure shows the output velocity of the second fluxon,  $v_{f,2}$ . The output velocity of the first fluxon is only operation dependent— $v_{f,1} = 0.91v_0$  and  $0.95v_0$  for the NOT and transmission type, respectively—and is shown as a label in Figs.  $4(b) - 4(e)$ . We also obtain  $\tau_i$  from the simulations (not shown). From the different input combinations ( $\sigma_1, \sigma_2$ ) and samples  $T_d$ , we determine the mean  $\langle T'_d - T_d \rangle$  and the maximum max $(T_d - T_d)$  of the delay time changes as functions of  $\Delta x$ . Averaging over all input combinations, the delay time remains unchanged from one BSR to the next:  $\langle T_d' - T_d' \rangle$  $T_d$ / $t_{LJJ} \approx 0$ , where  $t_{LJJ} = \Delta x/v_0$  is the nominal passage time in the connecting LJJ and we assume  $v_0 = 0.6c$ . However, we find that the dominant delay-time change is caused by different input combinations. The maximum change in delay time is max $(T'_d - T_d)/t_{LJJ} \approx +19\%,$ which comes from the input combination  $(\sigma_1, \sigma_2)$  =  $(-1, 1)$ . The maximum negative change in delay time is  $\min(T_d - T_d)/t_{LJJ} \approx -14\%$ , which comes from the input combination  $(\sigma_1, \sigma_2) = (1, 1)$ . For the other two combinations, the delay-time changes are comparably small.

As mentioned above, asynchronous gates require a minimum delay time between bits to ensure that their interactions are negligible. Since the delay time  $T_d$  after the gate can decrease as a function of the LJJ length  $\Delta x$ (for the case  $(\sigma_1, \sigma_2) = (1, 1)$ ), the LJJ should not be too long to still meet the minimum-delay criterion. We estimate an upper limit of  $\Delta x$  in the following way. The initial delay time  $T_d$  will likely be a factor  $p > 1$ of the minimum delay time  $T_{d,\text{min}} = 10\lambda_J/v_0$ . After the gate operation, the final delay time may be decreased to  $\min(T_d) = p 10\lambda_J/v_0 - 0.14 \Delta x/v_0$ . Since  $T_d > T_{d,\min}$  has to be fulfilled, one obtains the upper limit for the LJJ length of  $\Delta x < 70(p - 1)\lambda_J$ . As a lower limit for the gateconnecting LJJ length, we set  $\Delta x > 10\lambda_J$ , i.e., the same value that also ensures negligible fluxon-fluxon interactions. In our experience, LJJs shorter than that can hinder the free fluxon motion between gates.

Another source of timing error is jitter from the LJJs, which is the fluctuation  $\sigma_t$  of fluxon passage times caused by thermal noise during its motion in the LJJ. In the ballistic regime of underdamped LJJs, where  $\alpha \omega_J t_{\text{LJJ}} \ll 1$ , the jitter error  $\sigma_t/t_{LJJ}$  is expected to be small [\[29,](#page-17-23)[30\]](#page-17-24), where  $\sigma_t$ is the standard deviation of the LJJ passage time. Herein,  $\alpha$  is the damping coefficient of the Sine-Gordon equation [\[32\]](#page-17-25) and is given by the loss tangent of the LJJ,  $\alpha =$  $g_J/(\omega_J c_J)$ , with conductance  $g_J$  and capacitance  $c_J$  per unit length. In our discrete LJJs, this corresponds to the loss tangent of each JJ,  $\alpha = \tan \delta = 1/(\omega_J C_J R_J)$ . For example, in a discrete Nb LJJ with energy scale  $E_0/(k_B T) \approx 50$ (cf. Sec. [IV\)](#page-13-0), a JJ loss tangent of tan $\delta = 2 \times 10^{-3}$  typical for large-area  $AIO<sub>x</sub>$  barriers, and fluxon speed 0.6*c*, a conservative estimate for the jitter error [\[30\]](#page-17-24) amounts to  $\sigma_t/t_{\rm LJ} \lesssim 2\%$  for motion over  $(10–70)\lambda_J$ , corresponding to 30–190 cells of our discrete LJJ. A more refined model [\[31\]](#page-17-26) predicts even smaller jitter error. The jitter error for the relevant LJJ lengths is thus expected to be much smaller than the above-estimated timing uncertainty arising from the gate operation. We expect that thermal noise in the gate interface itself will also create jitter of the same order of magnitude as the LJJs. Therefore we do not simulate thermal noise in this work.

## <span id="page-10-0"></span>**III. COLLECTIVE-COORDINATE ANALYSIS**

For solitons and other collective excitations of a manybody system, the collective-coordinate (CC) method is a powerful way to reduce the many degrees of freedom to a few essential coordinates [\[28\]](#page-17-22). We have previously developed such a CC model for the fundamental (1-bit) RFL gates [\[17\]](#page-17-11). Here, we extend the model to the BSR, in particular the 1-input BSR of Fig. [1.](#page-2-0) To this end, we parametrize the LJJ fields left and right of the interface (at  $x = 0$ ) with the ansatz

<span id="page-10-2"></span>
$$
\phi(x < 0) = (\phi^{(\sigma, X_L)} + \phi^{(-\sigma, -X_L)})(x) + 2\pi (k_L - 1 + \sigma),
$$
  

$$
\phi(x > 0) = (\phi^{(-\sigma, X_R)} + \phi^{(\sigma, -X_R)})(x) + 2\pi (k_R - 1).
$$
 (11)

Each (left and right) field consists of a linear superposition of a fluxon and its mirror antifluxon, where  $\phi^{(\sigma,X)}$  is the phase field of a fluxon of polarity  $\sigma$ , which we model as a kink equivalent to the soliton solution of the LJJ field [\[28\]](#page-17-22),  $\phi^{(\sigma,X)}(x,t) = 4 \arctan (e^{-\sigma (x-X)/W})$ . Herein, the timedependent fluxon positions *XL*,*R*(*t*) serve as the dynamical coordinates of the model, while the fluxon width *W* is taken to be constant in a so-called adiabatic approxima-tion [\[33\]](#page-17-27). As in Sec. [II B,](#page-3-3) the integers  $k_{L,R}$  describe the vacuum levels of the left and right phase fields before the arrival of the fluxon. The resulting rail-JJ phase  $\phi^B = \phi_L$  –  $\phi_R = 2\pi (k_L - k_R)$  corresponds to an initial orientation  $S =$  $(k_L - k_R)$  of the stored flux quantum. In comparison, the CC model developed for 1-bit RFL gates [\[17\]](#page-17-11) is based on Eq. [\(11\),](#page-10-2) with the special case  $k_L - k_R = 0$ .

The ansatz [\(11\)](#page-10-2) neglects that for  $S \neq 0$ , the LJJ fields may deviate in the vicinity of the interface from the vacuum levels, as modeled by the bound states [see Eq.  $(5)$ ]. However, considering the relatively small bound-state amplitudes of the BSR with  $|S| = 1$ , this approximation seems justified.

## **A. CC-model parametrization and potential**

Examples for the parametrization of Eq.  $(11)$  are shown in the insets of Fig. [9](#page-11-0) for different points in coordinate space  $(X_L, X_R)$ . Figures  $9(a)$ –  $9(c)$  represent the different configurations  $(k_L, k_R) = (0, 0), (1, 0),$  and  $(0, 1)$ , respectively, for fluxon polarity  $\sigma = 1$ . A fluxon ( $\sigma = 1$ ) initially situated in the left LJJ far away from the interface at  $X/\lambda_J \ll -1$ , is approximated by Eq. [\(11\)](#page-10-2) with the coordinate  $X_L = X$ , while  $X_R = 0$  describes the absence of excitations in the right LJJ (see the leftmost insets in all panels of Fig. [9\)](#page-11-0). For this initial state, Eq. [\(11\)](#page-10-2) forms a step at the interface  $(x = 0)$  between  $2\pi k_L$  to the left and  $2\pi k_R$  to the right, corresponding to the initially stored flux state  $S = (k_L - k_R)$ . With  $k_{L,R}$  and  $\sigma$  set by the initial state, Eq.  $(11)$  fulfills the boundary conditions,  $\phi(x \to -\infty) = 2\pi k_L + 2\pi \sigma$  and  $\phi(x \to \infty) = 2\pi k_R$ , for any finite *XL*,*R*. Under these boundary conditions, four different asymptotic single-fluxon states are permitted and these can be parametrized through suitable choice of *XL*,*R*: a fluxon (antifluxon) in the left LJJ is parametrized by  $X_L$  < 0 ( $X_L$  > 0) together with  $X_R$  = 0 and a fluxon (antifluxon) in the right LJJ is parametrized by  $X_L = 0$  together with  $X_R < 0$  ( $X_R > 0$ ). In the center of the coordinate space,  $(X_L, X_R) = (0, 0)$ , the phase distribution forms a step between  $2\pi(k_L + \sigma)$  to the left and  $2\pi k_R$  to the right of the interface, where  $\phi^B = 2\pi (k_L - k_R + \sigma)$ . At points near  $(X_L, X_R) \approx (0, 0)$ , the step is modified by the possible excitations near the interface; see, e.g., the inset in the bottom-right corner of Fig.  $9(c)$ . In the corners of the configuration space where  $(|X_L| \gg 1, |X_R| \gg 1)$ , Eq. [\(11\)](#page-10-2) describes unavailable (high-energy) two-fluxon states (not shown).

Using Eq.  $(11)$ , we can derive the CC model for the BSR. This derivation is discussed in detail in Appendix [A,](#page-15-0)

<span id="page-11-0"></span>

FIG. 9. The CC potentials  $U(X_L, X_R)$  and trajectories  $(X_L, X_R)(t)$  (red line) for a 1-input BSR with polarity  $\sigma = 1$  of incoming fluxon and with initially stored states (a)  $S = 0$ , (b)  $S = 1$ , and (c)  $S = -1$ . Equipotential lines at the following energies are shown: stored bit energy,  $E_S = 2\pi^2 (L\lambda_J)/(L_s a)S^2$  (black); initial energy,  $E_{\text{init}} = E_S + E_{\text{fl}}$  (gray); and potential energy at center,  $U(0, 0) =$  $2\pi^2(L\lambda_J)/(L_s a)(S+\sigma)^2$  (brown). The CC model is based on the mirror-fluxon ansatz [see Eq. [\(11\)\]](#page-10-2), which is illustrated in the insets for various points  $(X_L, X_R)$  in coordinate space. The initial field distribution before fluxon arrival corresponds to the point  $X_L \ll -\lambda_J$ and  $X_R = 0$  (left inset), where the fields to the left and right of the interface are  $2\pi k_L$  and  $2\pi k_R$ , with  $k_L - k_R = S$ . The CC trajectories (red) show solutions of the CC equations of motion [see Eq. [\(14\)\]](#page-12-0) and illustrate (a) loading of the storage cell with a flux quantum (not optimized), (b) transmission-type BSR dynamics, and (c) NOT-type BSR dynamics. In all cases, the CC trajectories show good agreement with trajectories obtained from the circuit simulation results  $\phi_n^{(l,r)}(t)$  fitted to the form of Eq. [\(11\)](#page-10-2) (blue markers). The BSR parameters are those of Table [I.](#page-8-2) Note that *U* depends on the initial stored state *S* and the input-fluxon polarity  $\sigma$  as the product  $\sigma S$ , such that for  $\sigma = -1$ , the potential (and dynamics) of (b) and (c) would be exchanged, while (a) would remain unchanged.

while here we simply summarize the results. After inserting Eq. [\(11\)](#page-10-2) into the system Lagrangian, Eq. [\(1\)](#page-3-1) is simplified to

$$
\frac{\mathcal{L}}{E_0} = \frac{1}{2} \left( \frac{\dot{X}_L}{\dot{X}_R} \right) \mathbf{M} \left( \frac{\dot{X}_L}{\dot{X}_R} \right) - U(X_L, X_R), \tag{12}
$$

where  $U(X_L, X_R)$  is the dimensionless CC potential and the mass matrix **M** is composed of the coordinatedependent dimensionless elements  $M_{ii} = m_i(X_i)$  and  $M_{i,j\neq i} = m_{LR}(X_L, X_R)$  (*i*, *j* = *L*, *R*). The CC potential *U*, masses *mi*, and mass coupling *mLR* are given in Eqs.  $(A5)$ – $(A7)$ , respectively. Compared with the CC model of the 1-bit RFL gates [\[17\]](#page-17-11), an additional term,

$$
u_s = \frac{1}{2} (\sigma (\phi_L - \phi_R + 2\pi f_E))^2 = \frac{1}{2} (2\pi \sigma (k_L - k_R) + 8 \arctan e^{X_L/W} - 8 \arctan e^{-X_R/W} + 2\pi (1 + \sigma f_E))^2,
$$
\n(13)

contributes to the CC potential *U* [cf. Eq. [\(A7\)\]](#page-15-2), which stems from the shunt current through the inductor *Ls*.

The diagonal elements  $m_i$  of the mass matrix **M** vary with  $X_i$  near the interface but asymptotically  $(|X_i| \ll \lambda_J)$ approach a constant value,  $m_i = 8\lambda_J/W$ . The mass coupling *mLR* is exponentially suppressed far away from the interface but is finite near it. It is proportional to the rail-JJ capacitance  $C_J^B$  and this explains the important role of  $C_J^B$  for the forward scattering of a fluxon from one LJJ to another in many gates. In the fundamental (NOT and ID) RFL gates, mass coupling is the dominant coupling mechanism between the LJJs, whereas coupling generated by the potential is negligible since the potential gradient always acts perpendicular to the coordinate axes:  $\partial U/\partial X_i|_{X_i=0}$  = 0. Relative to the fundamental RFL gates, the BSR has an added contribution  $u_s$  in the CC potential  $U$ , which can generate a much stronger coupling between *XL* and *XR*, depending on the configuration  $(k_L, k_R)$ .

The CC potential *U* is shown in Fig. [9](#page-11-0) for the BSR parameters of Table [I,](#page-8-2)  $\sigma = 1$ , and three different configurations  $(k_L, k_R)$ . We emphasize that *U* depends parametrically on parameters of the initial state, namely, on the initially stored SFQ,  $S = k_L - k_R$ , and on the polarity  $\sigma$  of the incoming fluxon. Specifically, the dependence enters in the form of the product  $\sigma S$ , as can be seen from Eq. [\(13\)](#page-12-1) for the case of zero external flux through the storage cell,  $f_E = 0$ . All other contributions to the CC potential,  $U_0$ ,  $u_1$ , and  $u_2$  in Eq. [\(A7\),](#page-15-2) are independent of both  $\sigma$  and *S*. Note that the product  $\sigma S$  preserves the invariance of the circuit under phase inversion ( $\sigma \rightarrow -\sigma$  and  $S \rightarrow -S$ ), which would only be broken in the presence of finite  $f_E$ .

Most contributions to the CC potential,  $U_0$ ,  $u_1$ , and  $u_2$ in Eq.  $(A7)$ , have mirror symmetry about the line  $X_R =$  $-X_L$ , whereas  $u_s$  has this symmetry only for  $\sigma$  ( $k_L - k_R$ ) =

 $-1$ , as can be seen from Eq. [\(13\)](#page-12-1) for  $f_E = 0$ . In Fig. [9,](#page-11-0) the mirror symmetry is thus seen only in Fig.  $9(c)$ , where  $\sigma$ ( $k_L - k_R$ ) = −1, while in the other cases it is broken. The asymmetry is particularly strong for  $\sigma$ ( $k_L - k_R$ ) = +1, in Fig. [9\(b\).](#page-11-0)

<span id="page-12-2"></span>A fluxon initially at position  $X \ll -\lambda_J$ , moving with velocity  $v_0$ , is parametrized by  $(X_L, X_R) = (X, 0)$ ,  $(\hat{X}_L, \hat{X}_R) = (v_0, 0)$  and the related fluxon width  $W/\lambda_J =$ <br> $\sqrt{1 - v^2/c^2}$  For these initial conditions, the initial energy  $1 - v_0^2/c^2$ . For these initial conditions, the initial energy of the system is found from Eq. [\(12\)](#page-12-2) to be  $E_{init} = E_S +$  $E_{\rm fl}(v_0)$ . Herein,  $E_{\rm fl}(v_0)$  is the initial fluxon energy [see Eq.  $(10)$ ] and  $E<sub>S</sub>$  is the energy of the initially stored flux quantum *S*, as given in Eq. [\(9\).](#page-5-2) The coordinate space accessible during free evolution with energy *E*init is indicated by the corresponding equipotential lines (gray) in Figs.  $9(a) - 9(c)$ . In Fig.  $9(c)$ , this space consists of a central well that connects four asymptotic "scattering valleys." All of these correspond to a single fluxon but differ by its polarity or its position in either the left or right LJJ (cf. the above description). In Figs.  $9(a)$  and  $9(b)$ , only two of these valleys are connected as a result of the asymmetry of the potential, namely, the input valley of the fluxon  $(X_L < 0$ ,  $X_R \approx 0$ ) with the valley ( $X_L \approx 0, X_R < 0$ ) that corresponds to a forward-scattered fluxon.

<span id="page-12-1"></span>The Lagrangian [see Eq.  $(12)$ ] generates the coupled equations of motion

$$
\begin{pmatrix} \ddot{X}_{L} \\ \ddot{X}_{R} \end{pmatrix} = -\mathbf{M}^{-1} \begin{pmatrix} c^{2} \frac{\partial U}{\partial X_{L}} + \frac{1}{2} \frac{\partial m_{L}}{\partial X_{L}} \dot{X}_{L}^{2} + \frac{\partial m_{LR}}{\partial X_{R}} \dot{X}_{R}^{2} \\ c^{2} \frac{\partial U}{\partial X_{R}} + \frac{1}{2} \frac{\partial m_{R}}{\partial X_{R}} \dot{X}_{R}^{2} + \frac{\partial m_{LR}}{\partial X_{L}} \dot{X}_{L}^{2} \end{pmatrix},
$$
\n(14)

which describe the free dynamics of the coordinates *XL*,*<sup>R</sup>* for fixed initial values of *S* and  $\sigma$ . Recall that in our CC model, Eq.  $(11)$ , *S*, and  $\sigma$  are mere parameters determined by the initial state. However, as becomes clearer in the discussion below, the corresponding values *S'* and  $\sigma'$  after the scattering can also be deduced from the asymptotic states of the evolution.

#### <span id="page-12-0"></span>**B. CC-model results**

From Eq. [\(14\),](#page-12-0) we obtain the CC trajectories  $(X_L, X_R)(t)$ shown in Figs.  $9(a)$ –9(c) (red lines). We also compare the CC trajectories with "simulated" trajectories, which are obtained by fitting the phases  $\phi_n^{l,r}(t)$  of the circuit simulations to Eq. [\(11\)](#page-10-2) (blue markers). As Fig. [9](#page-11-0) demonstrates, there is generally very good qualitative and quantitative agreement with the CC-model trajectory.

Next, we describe how an empty BSR circuit is loaded with a SFQ such that it is initialized for regular BSR operation. In Fig.  $9(a)$ , the trajectory of the incoming fluxon enters the central potential well, where it bounces multiple times. Since no damping has been included in the CC dynamics [see Eq.  $(14)$ ], the trajectory may eventually exit the well, corresponding to a fluxon emitted from the interface into the left or right LJJ (see the insets). In the circuit simulations, however, even though no resistances are included, the generation of plasma waves at the interface effectively constitutes a weak damping mechanism that prevents later escape from the interface. For the coordinates trapped near the center point  $(X_L, X_R) \approx (0, 0)$ , the phase distribution is close to a step profile (see the inset pointing to the center of the coordinate space), whereas the initial fluxon profile has vanished. From a comparison with the initial state (see the inset pointing to the left-hand side of the coordinate space), where no flux was stored in the BSR  $(S = 0)$ , it is clear that a flux quantum has now been added to the storage cell, i.e.,  $S' = 1 = S + \sigma$ . While Fig.  $9(a)$  shows the dynamics for a high-energy fluxon  $(X_L(0) = v_0 = 0.6c)$  in the absence of an external flux  $f_E$ through the storage cell, the noise in the loading process can in principle be lowered by using a low-energy fluxon. This decreases the amount of energy that must be lost to capture the fluxon. To avoid back reflection of the lowenergy fluxon from the gate interface, one must lower the interface potential by applying a flux  $f_E \neq 0$ .

Figure  $9(b)$  shows results for the initial state  $S = 1$ . Here, the central well of *U* is separated by a potential barrier from the input valley. The trajectory is shown to follow along the curved potential into the scattering valley, which corresponds to a fluxon in the right LJJ, while *S* remains unaffected (inset). This process thus corresponds to the transmission of a fluxon without topological change—the initial phase difference of  $2\pi$  between the left and right LJJ is therefore roughly maintained during the scattering. In comparison, the dynamics of an ID gate are more complicated (as well as longer in duration), strongly relying on mass-coupling forces [cf. Fig.  $4(a)$  in Ref. [\[17\]](#page-17-11)].

Figure  $9(c)$  shows results for the BSR initially in the state  $S = -1$ . Here, the CC potential resembles that of a fundamental 1-bit gates [cf. Fig.  $4(a)$  in Ref. [\[17\]](#page-17-11)]. The resulting CC dynamics is similar to the NOT fundamental gate dynamics [cf. Fig.  $7(c1)$  $7(c1)$  in Ref. [\[17\]](#page-17-11) and can be explained by the combined effect of the strong masscoupling (large  $C_J^B$ ) together with the forces from the potential  $(I_c^B, I_s)$  and the mass gradients  $(C_J^B, \hat{C}_J)$ . The resulting state after the scattering corresponds to a forwardscattered antifluxon, while the stored flux is inverted:  $S' = \sigma = -S$  (inset).

As these examples show, the CC model—though heavily simplifying the many-JJ circuit to a reduced system with 2 degrees of freedom—describes the fluxon scattering in the BSR accurately. Furthermore, it is a good tool to interpret and predict the fluxon dynamics at circuit interfaces. With the help of the CC model, we are able to understand how the product  $\sigma S$ , which represents the relative polarity of moving and stored bit states, changes the potential landscape. This in turn changes the scattering dynamics, as we describe for the three relevant cases (initialization and the two distinct BSR gate operations).

#### <span id="page-13-0"></span>**IV. DISCUSSION**

Shift registers [\[6](#page-17-28)[,34–](#page-17-29)[38\]](#page-17-30) constitute a class of superconducting memory [\[2,](#page-16-1)[39\]](#page-17-31) intended to provide fast access to small amounts of data. They are usually loaded as serial input and intended as first-in–first-out (FIFO) buffers. This is different than RAM, which is a larger memory intended for addressing in a two-dimensional array. In superconducting logic, the currently available RAM [\[40](#page-17-32)[–42\]](#page-18-0) typically builds upon vortex-transition memory [\[43](#page-18-1)[,44\]](#page-18-2) for SFQ-based logic. However, new AQFP [\[45\]](#page-18-3), magnetic superconducting hybrid [\[46\]](#page-18-4), nanowire-based [\[47,](#page-18-5)[48\]](#page-18-6), and DRAM [\[49\]](#page-18-7) memories can provide sufficient memory for near-term superconducting logic applications.

In the future, we intend to describe how to build large shift registers. The 2-bit shift register described above is made from two 1-bit BSRs. It shows that ballistic gates can in principle be executed in a sequence of two without external power with certain defined constraints. The same holds for the 2-input BSRs and, furthermore, the simulation data indicate that multi-bit-line BSRs can be operated with similar performance. This would allow some bit lines to be used for input and others to be used for sending data to the needed outputs. Having a logical depth of 2 without external power provides a useful feature, because almost every gate is currently clocked (and not sequenced) in most SFQ logic. This could also lead to new methods to incorporate memory with logic (cf. Ref. [\[50\]](#page-18-8)). In principle, RFL BSRs could allow greater density than shift registers that use dual-rail encoding [\[36\]](#page-17-33), because RFL already uses a SFQ for both bit states. We finally note that, in principle, a dual-rail RSFQ could produce an output with both rails merged, such that it feeds data into a BSR.

The 2-input BSRs in this work (with 2-outputs) can in principle be used to shift information on demand in two dimensions (vertically and horizontally). In previous work, we have also shown how to make a CNOT gate  $[18]$  that can provide XOR operations. In the future, we plan to release gates to do full digital processing. The XOR operation could give the sum operation for either a half or a full adder. To realize such an adder, we plan to study a reversible gate that executes the missing multiplication function.

Logic gates generally have different execution times (delays) [\[51\]](#page-18-9) and in SFQ, logic, delays, and jitter lead to requirements for clock and bit synchronization [\[52–](#page-18-10) [54\]](#page-18-11) and sometimes bit arbitration [\[55\]](#page-18-12). However, some progress has been made recently in this area with the introduction of a second time constant within an AND and OR gate to conveniently define a timing window for receiving the SFQ (bit  $= 1$  state) [\[56\]](#page-18-13). Additionally, our asynchronous gates provide a positive development for RFL (and any future bipolar-SFQ logic) because the timing requirement basically reduces to a requirement of bit order. In RSFQ, the (unipolar-bit) toggle (T) flip flop is comparable in that it uses no clocking and the bits only need to arrive in a particular order.

Superconducting logic can in principle approach thermodynamic reversibility, meaning that there is no lower limit to energy dissipation. In the past, this has been studied with so-called adiabatic logics [\[20](#page-17-14)[–24\]](#page-17-17). In these logic types, adiabatic clock wave forms drive the gates such that the circuit state is always close to a potential-energy minimum. In these adiabatically reversible logic types, the dissipated energy scales with the inverse of the clock period such that the thermodynamic limit is approached by lowering the clock frequency. Ballistically reversible logic follows an alternative approach. In ballistic superconducting logic, fluxons in LJJs have been chosen as the bit carriers. The gates are powered by the same input fluxons, while no external power is applied. Output states with the same energy as the input states are accessible through the free dynamics powered by the energy of the fluxon entering the gate circuit. Near-thermodynamic reversibility relies on successful exiting of the output fluxon state with a practical velocity while only small amounts of energy are lost to other degrees of freedom. BSRs, similar to previous ballistic RFL gates, satisfy these reversible criteria.

When setting the gate energy efficiency to  $\geq 86\%$ , we obtain the wide parameter margins shown in Table [I.](#page-8-2) It follows that the energy cost per operation is  $E_{op} < 0.14E_B$ , where  $E_B$  is the energy of the input bit (which is independent of its bit state). For the assumed input velocity of the fluxon  $v_0/c = 0.6$  this bit energy is  $E_B = 10E_0$ , compared with the rest energy  $8E_0$  of a stationary bit [cf. Eq.  $(10)$ ]. Herein, the energy scale  $E_0 = I_c \Phi_0 \lambda_J / (2\pi a)$  depends on the LJJ fabrication. The BSR can be fabricated from digital foundry materials, such as Nb superconductor with an  $AIO<sub>x</sub>$  barrier, similar to previous RFL gates [\[27\]](#page-17-21). For example, in an LJJ built with the discreteness used in our simulations,  $a/\lambda_J = \sqrt{2\pi I_c L/\Phi_0} = 1/\sqrt{7}$  and with  $I_c =$  $3 \mu A$ , an energy cost of  $\lt 3.7$  zJ/op would result. When experimentally optimized and realized, this could compare favorably with state-of-the-art logic efficiency results (cf. Ref. [\[57\]](#page-18-14)). With other materials, one could in principle lower  $E_B$  closer to  $k_B T$  to achieve an even lower energy cost.

Although it is beyond the scope of this work to specify a full architecture for RFL, it is obvious that the BSRs could be tested by a train of fluxons traveling with some interval between them. For example, in the simulation data of Fig. [3,](#page-5-1) the input fluxons arrive with a time interval of  $T =$  $8/v_J$  for clarity (smaller intervals are possible). The energy cost estimated above from the circuit simulation includes the loss to plasma waves (from the imperfectly reversible undamped gates). At very high Josephson frequencies comparable to the superconducting gap, additional loss due to quasiparticles is expected and would therefore set an upper limit to the Josephson frequency ν*<sup>J</sup>* and the resulting operation speed. Assuming that circuits are made with a Josephson frequency  $v_J = 44$  GHz and that the abovementioned time interval is used between bits, we calculate a real time interval of 182 ps/op. From the rate (5.5 GHz) and the above energy calculation, the maximum power loss per bit (during operations) is estimated as 20 pW. Our gates allow a combined benefit of asynchronous timing and energy efficiency. With these modest assumptions, the maximum energy-delay product (EDP) for the shift register is less than  $6.6 \times 10^{-31}$  Js =  $10^3 h$ . Clocking will cost energy as well and will be addressed in future work. However, we note that an EDP on the order of  $1 \times 10^{-22}$  Js has been modeled in a one-stage clocked RSFQ architecture [\[54\]](#page-18-11). In the future, we plan to target low EDP in all of our gates, including those that are less efficient nonballistic gates (see, e.g., Ref. [\[18\]](#page-17-13) for a clock-triggered gate).

## **V. CONCLUSIONS**

Reversible logic may progress digital computing generally because it allows great improvements in computing efficiency at the gate level. In contrast, end-of-theroad-map CMOS will have an orders-of-magnitude higher energy cost per bit switching. The type of reversible logic gates that we study here (RFL BSRs) are ballistic. By introducing the asynchronous feature to ballistic gates, as we do in this work, we expect greater practicality in our reversible logic family (RFL) since the timing requirements are reduced.

The ADF flux shuttle has provided a pioneering design for a shift register prior to the start of SFQ logic. That logic is thermodynamically irreversible with the energy of the bit dissipated during every logic operation. In contrast to the ADF shuttle, which has bits encoded by SFQ presence, our RFL ballistic gates conditionally invert fluxon polarity, where the fluxon polarity encodes the bit state. In this work, we introduce BSRs that add the feature of memory to previous ballistic multiport gates. The BSRs rely only on the ballistic scattering dynamics between the input fluxon and the stored SFQ. Here, the gate dynamics fall into two cases, which consist of the resonant NOT case, generally used in RFL, and a simpler transmission case.

We perform circuit simulations of a 1-input BSR, as well as a shift register composed of two 1-input BSR gates in sequence. In another design, we introduce a 2-input BSR that can be used as a register with separate write and read ports or, alternatively, as a device to shift the bit state between different bit lines. Furthermore, we discuss how this may be helpful for a register-based memory.

Since the ballistic scattering depends on the stored bit state (unlike previous RFL gates), the 1-input and 2-input BSR constitute the first set of asynchronous reversible logic gates appropriate for feedforward computing. The former (1-input) gate is shown to allow the execution of two in sequence without external power. The latter gate allows more bit lines to be added. We discuss how this is related to logical depth and timing requirements.

Most importantly technically, perhaps, is that the BSR has wide parameter margins, where all margins are above 46% when the energy efficiency is set to 86%. This is far above the variation in today's standard fabrication processes, such that BSRs can be tested.

In addition to circuit simulations, we model the BSR dynamics by a CC model, which reduces the many-JJ degrees of freedom to only two coordinates. With the help of this model, the state-dependent scattering dynamics can be understood from effective potentials in fluxon coordinate space—the BSR scattering potentials are dependent on the initial fluxon and SFQ states.

All SFQ logic types switch in a time equal or greater than the natural oscillation period of the Josephson junctions. Our logic is a fast reversible logic type in that it is designed to switch in only a few Josephson periods and it has the potential to be faster than adiabatically powered reversible logic, which is generally slowed by the adiabatic power source. Consecutive BSR operations are shown to be possible in fewer than eight Josephson periods. We do not currently see the need for an adiabatic clock, in contrast to other reversible logic families, and this helps enable logic at high speed. Thus, we are optimistic that our ballistic logic may enable high-throughput high-efficiency computation with unpowered gate sequences.

#### **ACKNOWLEDGMENTS**

K.D.O. would like to thank Q. Herr, A. Herr, M. Frank, R. Lewis, N. Missert, I. Sutherland, V. Semenov, K. O'Brien, B. Sarabi, C. Richardson, D. Mountain, G. Herrera, and N. Yoshikawa for stimulating scientific discussions. We thank Seeqc [\[58\]](#page-18-15) for their professional foundry services, which were used to fabricate the RFL gates. W.W. acknowledges funding from Minerva Engineering LLC and would like to thank the Physics Department at the University of Otago for its hospitality.

### <span id="page-15-0"></span>**APPENDIX A: COLLECTIVE-COORDINATE ANALYSIS**

Here, we sketch the derivation of a CC model for the 1-input BSR of Fig.  $1(a)$ , leading to the results discussed in Sec. [III.](#page-10-0) The procedure is similar to the collectivecoordinate analysis for other RFL gates [\[17\]](#page-17-11).

The starting point is the circuit Lagrangian [see Eq.  $(1)$ ] with the interface contribution [see Eq.  $(4)$ ]. Inserting the mirror fluxon ansatz [see Eq.  $(11)$ ], the LJJ contributions become

$$
\frac{1}{E_0} \left( \mathcal{L}_l + \mathcal{L}_r \right) = \sum_{i=L,R} \frac{m_0(X_i)}{2} \frac{\dot{X}_i^2}{c^2} - U_0(X_L, X_R), \quad \text{(A1)}
$$

<span id="page-15-4"></span>
$$
U_0 = \sum_{i=L,R} \left\{ \frac{4\lambda_J}{W} \left( 1 - \frac{2z_i}{\sinh(2z_i)} \right) + \frac{2W}{\lambda_J} \tanh(z_i) \operatorname{sech}^2(z_i) \left[ 2z_i + \sinh(2z_i) \right] \right\},\tag{A2}
$$

<span id="page-15-3"></span>
$$
m_0(X_i) = \frac{8\lambda_J}{W} \left( 1 + \frac{2z_i}{\sinh(2z_i)} \right),\tag{A3}
$$

where  $z_i = X_i/W$  ( $i = L, R$ ). To obtain these expressions, we replace the LJJ sums in  $\mathcal{L}_{l,r}$  by integrals, based on the small discreteness,  $a/\lambda_J \ll 1$ . We evaluate all integrals with boundaries  $(-\infty, 0)$  and  $(0, \infty)$ , which corresponds to including the termination JJs of the interface as part of the LJJ. To correct for this, the corresponding energies have to be subtracted in the interface Lagrangian  $\mathcal{L}_I$  [see Eq. [\(4\)\]](#page-3-2) such that  $\hat{C}_J \rightarrow \hat{C}_J - C_J$  and  $\hat{I}_c \rightarrow \hat{I}_c - I_c$ . After inserting the ansatz Eq. [\(11\)](#page-10-2) also into  $\mathcal{L}_I$ , the full system Lagrangian reads

$$
\frac{\mathcal{L}}{E_0} = \frac{m_L \dot{X}_L^2}{2c^2} + \frac{m_R \dot{X}_R^2}{2c^2} + m_{LR} \frac{\dot{X}_L \dot{X}_R}{c^2} - U(X_L, X_R). \tag{A4}
$$

Herein, the interface modifies the dimensionless mass of Eq. [\(A3\)](#page-15-3) and also contributes a mass-coupling term,

<span id="page-15-1"></span>
$$
m_i(X_i) = m_0(X_i) + \frac{\hat{C}_J - C_J + C_J^B}{C_J \lambda_J / a} (g_I(X_i))^2, \quad (A5)
$$

$$
m_{LR}(X_L, X_R) = \frac{C_J^B}{C_J \lambda_J/a} g_I(X_L) g_I(X_R), \tag{A6}
$$

where the factor  $g_I(X_i) = 4 (\lambda_J/W) \operatorname{sech}(X_i/W)$  describes the local influence of the interface. The dimensionless CC potential of Eq. [\(A2\)](#page-15-4) is also modified,

<span id="page-15-2"></span>
$$
U = U_0 + \frac{\hat{I}_c - I_c + I_c^B}{I_c \lambda_J/a} u_1 + \frac{I_c^B}{I_c \lambda_J/a} u_2 + \frac{L \lambda_J/a}{L_s} u_s, (A7)
$$

with the interface contributions to the potential,

$$
u_1 = \sum_{i=L,R} 8 \operatorname{sech}^2(z_i) \tanh^2(z_i),
$$
\n
$$
u_2 = -\prod_{i=L,R} \left[ 8 \operatorname{sech}^2(z_i) \tanh^2(z_i) \right]
$$
\n
$$
+ \prod_{i=L,R} \left[ 4 \operatorname{sech}(z_i) \tanh(z_i) \left( 1 - 2 \operatorname{sech}^2(z_i) \right) \right] \quad \text{(A9)}
$$

and

$$
u_s = \frac{1}{2} \left( \sigma (\phi_L - \phi_R) + 2\pi \sigma f_E \right)^2. \tag{A10}
$$

Using  $\phi_L = \phi(x = 0^-) = 8 \arctan e^{\sigma X_L/W} + 2\pi (k_L - 1 +$ *σ*) and  $φ_R = φ(x = 0^+) = 8 \arctan e^{-σX_R/W} + 2π(k_R - 1)$ 

<span id="page-16-2"></span>

FIG. 10. The margins of 1-input BSR under variation of LJJ parameters: (a) the critical current density  $J_c$ ; (b) the cell inductances  $L$ ; (c) the cell inductances *L* and JJ areas; (d) the JJ capacitances  $C_J$ . The graphs are constructed similar to those in Fig. [6.](#page-8-3) In (a)–(c), the *x* axis shows the resulting variation of the relative discreteness,  $(a/\lambda_J)^2 = L/L_J = 2\pi L L_c/\Phi_0$ , instead of the actually varied parameter(s), while in (d) it shows the varied capacitance  $C_J$  relative to its nominal value  $C_J^{(0)}$ . All interface parameters are kept constant at values given in Table [I.](#page-8-2)

from Eq.  $(11)$ , the storage-cell contribution  $u_s$  can be written as

$$
u_s = \frac{1}{2} \left( 8 \arctan e^{X_L/W} - 8 \arctan e^{-X_R/W} + 2\pi \sigma (k_L - k_R) + 2\pi (1 + \sigma f_E) \right)^2.
$$
 (A11)

## **APPENDIX B: LJJ PARAMETER MARGINS**

Similar to the variation of interface parameters in Fig. [6,](#page-8-3) in Fig. [10](#page-16-2) we show the output-to-input velocity ratio under variations of different LJJ parameters: the JJ critical currents  $I_c$  [Fig. [10\(a\)\]](#page-16-2), the cell inductances *L* [Fig. [10\(b\)\]](#page-16-2), the cell inductances *L* along with  $I_c$  and the JJ capacitances  $C_J$  [Fig. [10\(c\)\]](#page-16-2), and  $C_J$  [Fig. [10\(d\)\]](#page-16-2). Figure  $10(a)$  corresponds to a variation of the critical current density  $J_c$ . Figure  $10(c)$  can geometrically correspond to linearly scaling the inductor length while scaling the JJ area by the same amount (the dependence of the inductance on the inductor width is more complex). The LJJ parameter variations in general give rise to scalings of (i) the Josephson penetration depth relative to the unit-cell length  $\lambda_J/a \equiv \sqrt{L_J/L} = \sqrt{\Phi_0/2\pi L L_c}$ , (ii) the JJ frequency  $\omega_J = 1/\sqrt{L_J C_J} = \sqrt{2\pi L_c/\Phi_0 C_J}$ , (iii) the LJJ energy scale  $E_0 = (\Phi_0/2\pi)^{3/2} \sqrt{I_c/L}$ , and (iv) the LJJ impedance  $Z = \sqrt{L/C_J}$ , as shown in the table of Fig. [10.](#page-16-2) The first three variations, shown in Figs.  $10(a)$ –  $10(c)$ , entail variations of the relative lattice spacing  $a/\lambda_J$  and we present the data in the respective panels as functions of  $(a/\lambda_J)^2$ . For most LJJ parameter variations, the margins of this quantity are typically around  $60 - 70\%$ , as, e.g., seen here in Figs.  $10(a)$  and  $10(b)$ . An exception is shown in Fig.  $10(c)$ , where all parameters of the LJJ are scaled by the same factor, as indicated in the table. Here, the resulting margins of  $(a/\lambda_J)^2$  are almost an order of magnitude larger and we attribute this robustness to the invariance of the fluxon energy  $10E_0$ . In contrast, in Figs.  $10(a)$  and  $10(b)$ , the fluxon energy is lowered relative to its nominal value for decreasing and increasing values of  $a/\lambda_J$ , respectively.

This leads to characteristic edges seen in the variation data of the transmission-type scattering (blue markers), which appear when the initial kinetic energy of the fluxon is too low to overcome the potential barrier on the transmission path.

We note that for large discreteness,  $(a/\lambda_J)^2 \approx 1$ , the moving fluxon loses energy through excitation of linear plasma modes in the LJJ [\[17](#page-17-11)[,59\]](#page-18-16). Because of the resulting strong damping, the output-to-input velocity ratio in Fig. [10\(c\)](#page-16-2) becomes increasingly ill defined for  $(a/\lambda_J)^2 \lesssim 1$ (it becomes dependent on the distance of the measurement points from the gate interface).

Figure  $10(d)$  shows the varied capacitance  $C_J$  relative to its nominal value  $C_J^{(0)}$ . We note that all of our simulations are performed with  $a/\lambda_J$  as the sole free parameter and that absolute values of  $C_J^{(0)}$ ,  $I_c^{(0)}$ , and  $L^{(0)}$  are not specified. However, for the Nb fabrication assumed in the discus-sion of Sec. [IV,](#page-13-0) the nominal values become  $L^{(0)} = 16$  pH,  $I_c^{(0)} = 3.0 \text{ }\mu\text{A} \text{ and } C_J^{(0)} = 120 \text{ } \text{fF.} \text{ Figure } 10 \text{ (d) shows}$ wider *CJ* margins (140%) compared with the inductance margins of Figs.  $10(a)$  and  $10(b)$ . We attribute this increase to the fact that the fluxon energy is not changed here, unlike in Figs.  $10(a)$  and  $10(b)$  (cf. the table in Fig. [10\)](#page-16-2). However, the margins are also smaller than in the case of Fig.  $10(c)$ . In Fig.  $10(d)$ , we observe that high values of  $C_J$  limit the NOT operation (red markers). This is likely due to the lowered LJJ frequency  $\omega$ <sub>*I*</sub>; the NOT gate is a resonant process that depends on the frequency of the LJJs.

<span id="page-16-0"></span><sup>[1]</sup> D. S. Holmes, A. L. Ripple, and M. A. Manheimer, Energyefficient superconducting computing—power budgets and requirements, [IEEE Trans. Appl. Supercond.](https://doi.org/10.1109/TASC.2013.2244634) **23**, 1701610 (2013).

<span id="page-16-1"></span><sup>[2]</sup> I. I. Soloviev, N. V. Klenov, S. V. Bakurskiy, M. Y. Kupriyanov, A. L. Gudkov, and A. S. Sidorenko, Beyond Moore's technologies: Operation principles of a superconductor alternative, [Beilstein J. Nanotechnol.](https://doi.org/10.3762/bjnano.8.269) **8**, 2689 (2017).

- <span id="page-17-0"></span>[3] K. K. Likharev, O. A. Mukhanov, and V. K. Semenov, in SQUID' 85, Berlin, Germany (1985).
- <span id="page-17-20"></span>[4] K. K. Likharev and V. K. Semenov, RSFQ logic/memory family: A new Josephson-junction technology for sub[terahertz-clock-frequency digital systems,](https://doi.org/10.1109/77.80745) IEEE Trans. Appl. Supercond. **1**, 3 (1991).
- [5] D. E. Kirichenko, S. Sarwana, and A. F. Kirichenko, Zero [static power dissipation biasing of RSFQ circuits,](https://doi.org/10.1109/TASC.2010.2098432) IEEE Trans. Appl. Supercond. **21**, 776 (2011).
- <span id="page-17-28"></span>[6] M. H. Volkmann, A. Sahu, C. J. Fourie, and O. A. Mukhanov, Implementation of energy efficient single flux quantum digital circuits with sub-aJ/bit operation, [Supercond. Sci. Technol.](https://doi.org/10.1088/0953-2048/26/1/015002) **26**, 015002 (2013).
- <span id="page-17-1"></span>[7] M. Tanaka, K. Takata, T. Kawaguchi, Y. Ando, N. Yoshikawa, R. Sato, A. Fujimaki, K. Takagi, and N. Takagi, in *[15th International Superconductive Electronics Confer](https://doi.org/10.1109/ISEC.2015.7383449)ence (ISEC)* (2015).
- <span id="page-17-2"></span>[8] Y. Harada, H. Nakane, N. Miyamoto, U. Kawabe, E. Goto, and T. Soma, Basic operations of the quan[tum flux parametron,](https://doi.org/10.1109/TMAG.1987.1065574) IEEE Trans. Magn. **23**, 3801 (1987).
- <span id="page-17-3"></span>[9] C. L. Ayala, T. Tanaka, R. Saito, M. Nozoe, N. Takeuchi, and N. Yoshikawa, MANA: A monolithic adiabatic integration architecture microprocessor using 1.4- zJ/op unshunted superconductor Josephson junction devices, [IEEE J. Solid-State Circuits](https://doi.org/10.1109/JSSC.2020.3041338) **56**, 1152 (2021).
- <span id="page-17-4"></span>[10] O. P. Herr, A. Y. Herr, O. T. Oberg, and A. G. Ioannidis, [Ultra-low-power superconductor logic,](https://doi.org/10.1063/1.3585849) J. Appl. Phys. **109**, 103903 (2011).
- <span id="page-17-5"></span>[11] R. Landauer, Irreversibility and heat generation in the computing process, [IBM J. Res. Dev.](https://doi.org/10.1147/rd.53.0183) **5**, 183 (1961).
- <span id="page-17-6"></span>[12] Appendix 2 of Ref. [\[4\]](#page-17-20).
- <span id="page-17-7"></span>[13] T. A. Fulton, R. C. Dynes, and P. W. Anderson, The flux shuttle—a Josephson junction shift register employing single flux quanta, Proc. IEEE **61**[, 28 \(1973\).](https://doi.org/10.1109/PROC.1973.8966)
- <span id="page-17-8"></span>[14] T. A. Fulton and L. N. Dunkleberger, Experimental flux shuttle, [Appl. Phys. Lett.](https://doi.org/10.1063/1.1654621) **22**, 232 (1973).
- <span id="page-17-9"></span>[15] O. A. Mukhanov, S. V. Polonsky, and V. K. Semenov, New [elements of the RSFQ logic family,](https://doi.org/10.1109/20.133710) IEEE Trans. Magn. **27**, 2435 (1991).
- <span id="page-17-10"></span>[16] O. A. Mukhanov, Rapid single flux quantum (RSFQ) shift register family, [IEEE Trans. Appl. Supercond.](https://doi.org/10.1109/77.233529) **3**, 2578 (1993).
- <span id="page-17-11"></span>[17] W. Wustmann and K. D. Osborn, Reversible fluxon logic: Topological particles allow ballistic gates along [one-dimensional paths,](https://doi.org/10.1103/PhysRevB.101.014516) Phys. Rev. B **101**, 014516 (2020).
- <span id="page-17-13"></span>[18] K. D. Osborn and W. Wustmann, Reversible fluxon logic with optimized CNOT gate components, [IEEE Trans. Appl.](https://doi.org/10.1109/TASC.2020.3035344) Supercond. **31**, 1 (2021).
- <span id="page-17-12"></span>[19] K. D. Osborn and W. Wustmann, in *Reversible Computation*. RC 2018. Lecture Notes in Computer Science **11106**, 189 (Springer, Cham, 2018).
- <span id="page-17-14"></span>[20] K. K. Likharev, S. V. Rylov, and V. K. Semenov, Reversible conveyer computation in array of parametric quantrons, [IEEE Trans. Magn.](https://doi.org/10.1109/TMAG.1985.1063673) **21**, 947 (1985).
- <span id="page-17-15"></span>[21] V. K. Semenov, G. V. Danilov, and D. V. Averin, Negativeinductance SQUID as the basic element of reversible Josephson-junction circuits, [IEEE Trans. Appl. Supercond.](https://doi.org/10.1109/TASC.2003.814155) **13**, 938 (2003).
- [22] J. Ren, V. K. Semenov, Y. A. Polyakov, D. V. Averin, and J.-S. Tsai, Progress towards reversible computing with nSQUID arrays, [IEEE Trans. Appl. Supercond.](https://doi.org/10.1109/TASC.2009.2018250) **19**, 961 (2009).
- <span id="page-17-16"></span>[23] J. Ren and V. K. Semenov, Progress with physically and [logically reversible superconducting digital circuits,](https://doi.org/10.1109/TASC.2011.2104352) IEEE Trans. Appl. Supercond. **21**, 780 (2011).
- <span id="page-17-17"></span>[24] N. Takeuchi, Y. Yamanashi, and N. Yoshikawa, Reversible [logic gate using adiabatic superconducting devices,](https://doi.org/10.1038/srep06354) Sci. Rep. **4**, 6354 (2014).
- <span id="page-17-18"></span>[25] M. P. Frank, in *Asynchronous ballistic reversible comput[ing, 2017 IEEE ICRC conference proceedings](https://doi.org/10.1109/ICRC.2017.8123659)* (2017).
- <span id="page-17-19"></span>[26] M. P. Frank, R. M. Lewis, N. A. Missert, M. D. Henry, M. [A. Wolak, and E. P. DeBenedictis, in](https://doi.org/10.1109/ISEC46533.2019.8990900) *2019 ISEC conference proceedings* (2019).
- <span id="page-17-21"></span>[27] [L. Yu, W. Wustmann, and K. D. Osborn, in](https://doi.org/10.1109/ISEC46533.2019.8990914) *Proc. IEEE Int. Superconductive Electron. Conf.* (2019).
- <span id="page-17-22"></span>[28] R. Rajaraman, *Solitons and Instantons: An Introduction to Solitons and Instantons in Quantum Field Theory* (North-Holland, Amsterdam, 1989).
- <span id="page-17-23"></span>[29] [J. Rubinstein, Sine-Gordon equation,](https://doi.org/10.1063/1.1665057) J. Math. Phys. **11**, 258 (1970).
- <span id="page-17-24"></span>[30] A. Fedorov, A. Shnirman, G. Schön, and A. Kidiyarova-Shevchenko, Reading out the state of a flux qubit by [Josephson transmission line solitons,](https://doi.org/10.1103/PhysRevB.75.224504) Phys. Rev. B **75**, 224504 (2007).
- <span id="page-17-26"></span>[31] A. L. Pankratov, A. V. Gordeeva, and L. S. Kuzmin, Drastic Suppression of Noise-Induced Errors in Underdamped [Long Josephson Junctions,](https://doi.org/10.1103/PhysRevLett.109.087003) Phys. Rev. Lett. **109**, 087003 (2012).
- <span id="page-17-25"></span>[32] A. C. Scott, F. Y. F. Chu, and S. A. Reible, Magnetic-flux [propagation on a Josephson transmission line J,](https://doi.org/10.1063/1.323126) Appl. Phys. **47**, 3272 (1976).
- <span id="page-17-27"></span>[33] T. Dauxois and M. Peyrard, *Physics of Solitons* (Cambridge University Press, Cambridge, 2006).
- <span id="page-17-29"></span>[34] O. A. Mukhanov, RSFQ 1024-bit shift register for acquisition memory, [IEEE Trans. Appl. Supercond.](https://doi.org/10.1109/77.251810) **3**, 3102 (1993).
- [35] M. Hosoya, W. Hioe, K. Takagi, and E. Goto, Operation of a 1-bit quantum flux parametron shift register (latch) by 4-phase 36-GHz clock, [IEEE Trans. Appl. Supercond.](https://doi.org/10.1109/77.403181) **5**, 2831 (1995).
- <span id="page-17-33"></span>[36] K. Fujiwara, Y. Yamashiro, N. Yoshikawa, A. Fujimaki, H. Terai, and S. Yorozu, Design and high-speed test of  $(4 \times 8)$ [bit single-flux-quantum shift register files,](https://doi.org/10.1088/0953-2048/16/12/030) Supercond. Sci. Technol. **16**, 1456 (2003).
- [37] [K. Ishida, M. Tanaka, T. Ono, and K. Inoue, in](https://doi.org/10.1109/ISOCC.2016.7799755) *2016 International SoC Design Conference (ISOCC)* (2016).
- <span id="page-17-30"></span>[38] M. G. Bautista, P. Gonzalez-Guerrero, D. Lyles, and G. Michelogiannakis, Superconducting shuttle-flux shift regis[ter for race logic and its applications,](https://doi.org/10.1109/TCSI.2022.3210023) IEEE Trans. Circuits Syst. I: Regular Papers **70**, 1 (2022).
- <span id="page-17-31"></span>[39] [D. K. Brock, RSFQ technology: Circuits and systems,](https://doi.org/10.1142/S0129156401000861) Int. J. High Speed Electron. Syst. **11**, 307 (2001).
- <span id="page-17-32"></span>[40] S. Nagasawa, K. Hinode, T. Satoh, Y. Kitagawa, and M. Hidaka, Design of all-dc-powered high-speed single flux quantum random access memory based on a pipeline struc[ture for memory cell arrays,](https://doi.org/10.1088/0953-2048/19/5/S34) Supercond. Sci. Technol. **19**, S325 (2006).
- [41] V. K. Semenov, Y. A. Polyakov, and S. K. Tolpygo, Very large scale integration of Josephson-junction-based [superconductor random access memories,](https://doi.org/10.1109/TASC.2019.2904971) IEEE Trans. Appl. Supercond. **29**, 1302809 (2019).
- <span id="page-18-0"></span>[42] O. Mukhanov, N. Yoshikawa, I. P. Nevirkovets, and M. Hidaka, in *Fundamentals and Frontiers of the Josephson Effect*, edited by F. Tafuri, Springer Series in Materials Science **286** (Springer, Cham, 2019).
- <span id="page-18-1"></span>[43] S. Nagasawa, Y. Hashimoto, H. Numata, and S. Tahara, A 380 ps, 9.5 mW Josephson 4-Kbit RAM operated at a high bit yield, [IEEE Trans. Appl. Supercond.](https://doi.org/10.1109/77.403086) **5**, 2447 (1995).
- <span id="page-18-2"></span>[44] S. Nagasawa, H. Hasegawa, T. Hashimoto, H. Suzuki, K. Miyahara, and Y. Enomoto, Design of a 16 kbit super[conducting latching/SFQ hybrid RAM,](https://doi.org/10.1088/0953-2048/12/11/371) Supercond. Sci. Technol. **12**, 933 (1999).
- <span id="page-18-3"></span>[45] H. Takayama, N. Takeuchi, Y. Yamanashi, and N. Yoshikawa, A random-access-memory cell based on quan[tum flux parametron with three control lines J,](https://doi.org/10.1088/1742-6596/1054/1/012063) Phys.: Conf. Ser. **1054**, 012063 (2018).
- <span id="page-18-4"></span>[46] I. A. Dayton, T. Sage, E. C. Gingrich, M. G. Loving, T. F. Ambrose, N. P. Siwak, S. Keebaugh, C. Kirby, D. L. Miller, A. Y. Herr, Q. P. Herr, and O. Naaman, Experimental demonstration of a Josephson magnetic memory [cell with a programmable](https://doi.org/10.1109/LMAG.2018.2801820)  $\pi$ -junction, IEEE Magn. Lett. **9**, 3301905 (2018).
- <span id="page-18-5"></span>[47] Q.-Y. Zhao, E. A. Toomey, B. A. Butters, A. N. McCaughan, A. E. Dane, S.-W. Nam, and K. K. Berggren, A compact superconducting nanowire memory element [operated by nanowire cryotrons,](https://doi.org/10.1088/1361-6668/aaa820) Supercond. Sci. Technol. **31**, 035009 (2018).
- <span id="page-18-6"></span>[48] A. Murphy, D. V. Averin, and A. Bezryadin, Nanoscale superconducting memory based on the kinetic inductance [of asymmetric nanowire loops,](https://doi.org/10.1088/1367-2630/aa7331) New J. Phys. **19**, 063015 (2017).
- <span id="page-18-7"></span>[49] F. Wang, T. Vogelsang, B. Haukness, and S. C. Magee, in *[2018 IEEE International Memory Workshop \(IMW\)](https://doi.org/10.1109/IMW.2018.8388826)*, Kyoto, Japan (2018).
- <span id="page-18-8"></span>[50] J. Reuben, Rediscovering majority logic in the post-CMOS [era: A perspective from in-memory computing,](https://doi.org/10.3390/jlpea10030028) J. Low Power Electron. Appl. **10**, 28 (2020).
- <span id="page-18-9"></span>[51] I. E. Sutherland and R. F. Sproull, in *Proceedings of the 1991 University of California/Santa Cruz Conference on Advanced Research in VLSI*, edited by Carlo H. Séquin (MIT Press, Cambridge, Massachusetts, 1991).
- <span id="page-18-10"></span>[52] M. Maezawa, I. Kurosawa, M. Aoyagi, H. Nakagawa, Y. Kameda, and T. Nanya, Rapid single-flux-quantum dual[rail logic for asynchronous circuits,](https://doi.org/10.1109/77.621796) IEEE Trans. Appl. Supercond. **7**, 2705 (1997).
- [53] P. Patra, S. Polonsky, and D. S. Fussell, in *Proceedings [Third International Symposium on Advanced Research in](https://doi.org/10.1109/ASYNC.1997.587144) Asynchronous Circuits and Systems* (Eindhoven, Netherlands, 1997), p. 42.
- <span id="page-18-11"></span>[54] G. Tzimpragos, J. Volk, A. Wynn, J. E. Smith, and T. Sherwood, in *2021 ACM/IEEE 48th Annual International [Symposium on Computer Architecture \(ISCA\)](https://doi.org/10.1109/ISCA52012.2021.00057)* (2021).
- <span id="page-18-12"></span>[55] S. Yorozu, Y. Kameda, and S. Tahara, 60 Gbps throughput demonstration of an asynchronous SFQ-pulse arbitration circuit, [IEEE Trans. Appl. Supercond.](https://doi.org/10.1109/77.919421) **11**, 621 (2001).
- <span id="page-18-13"></span>[56] S. V. Rylov, Clockless dynamic SFQ and gate with high input skew tolerance, [IEEE Trans. Appl. Supercond.](https://doi.org/10.1109/TASC.2019.2896137) **29**, 1300805 (2019).
- <span id="page-18-14"></span>[57] N. Takeuchi, Y. Yamanashi, and N. Yoshikawa, Measurement of 10 zJ energy dissipation of adiabatic quantum-flux[parametron logic using a superconducting resonator,](https://doi.org/10.1063/1.4790276) Appl. Phys. Lett. **102**, 052602 (2013).
- <span id="page-18-15"></span>[58] [www.seeqc.com](file:www.seeqc.com)
- <span id="page-18-16"></span>[59] O. M. Braun and Y. S. Kivshar, Nonlinear dynamics of the Frenkel-Kontorova model, [Phys. Rep.](https://doi.org/10.1016/S0370-1573(98)00029-5) **306**, 1 (1998).