# Voltage-Controlled Topological Spin Switch for Ultralow-Energy Computing: Performance Modeling and Benchmarking

Shaloo Rakheja,<sup>1,\*</sup> Michael E. Flatté,<sup>2</sup> and Andrew D. Kent<sup>3</sup>

<sup>1</sup>Department of Electrical and Computer Engineering, New York University, Brooklyn, New York 11201, USA

<sup>2</sup>Department of Physics and Astronomy and Optical Science and Technology Center, University of Iowa, Iowa

City, Iowa 52242, USA

<sup>3</sup> Center for Quantum Phenomena, Department of Physics, New York University, New York, New York 10003, USA

(Received 21 February 2018; revised manuscript received 11 January 2019; published 3 May 2019)

A voltage-controlled topological spin switch (VTOPSS) that uses a hybrid topological insulator-magnetic insulator multiferroic material is presented that can implement Boolean logic operations with sub-10-aJ energy per bit and an energy-delay product on the order of  $10^{-26}$  J s. The device uses a topological insulator, which has the highest efficiency of conversion of the electric field to spin torque yet observed at room temperature, and a low-moment magnetic insulator that can respond rapidly to a given spin torque. We present the theory of operation of the VTOPSS, develop analytic models of its performance metrics, elucidate performance scaling with dimensions and voltage, and benchmark the VTOPSS against existing spin-based and CMOS devices. Compared with existing spin-based devices, such as allspin logic and charge-spin logic devices, the VTOPSS offers 10–70 times lower energy dissipation and 70–1700 times lower energy-delay product. With experimental advances and improved material properties, we show that the energy and energy-delay product of the VTOPSS can be lowered to a few attojoules per bit and  $10^{-28}$  J s, respectively. As such, the VTOPSS technology offers competitive metrics compared with existing CMOS technology. Finally, we establish that interconnect issues that dominate the performance in CMOS logic are relatively less significant for the VTOPSS, implying that highly resistive materials can indeed be used to interconnect VTOPSS devices.

DOI: 10.1103/PhysRevApplied.11.054009

### **I. INTRODUCTION**

Spin-based logic and memory devices use nanomagnets as digital spin capacitors to store and manipulate information [1]. Typically, spin-polarized electric currents or magnetic fields are used to control the magnetization vector of nanomagnets while reading and writing information [2]. Compared with their charge-based counterparts, spin-based devices offer nonvolatility of information and superior logical efficiency (i.e., fewer devices to implement a given Boolean function [3]). However, most spin-based devices suffer from high energy dissipation resulting from a large electric current density on the order of 10<sup>6</sup> A/cm<sup>2</sup> required to reorient the magnetization vector [4,5]. Such large current densities not only lead to excessive Joule heating in the device but could cause electromigration issues in metallic interconnects [6]. At the same time, reversal of metallic ferromagnetic bodies using antidamping spin-transfer torque (STT) proceeds on a timescale on the order of hundreds of picoseconds to a few nanoseconds [7]. As such, existing spin-based devices have an energy-delay product (EDP) that is 1000 to 10000 times larger than that of their CMOS counterparts [8].

To harness the full potential of spintronics technology. it is imperative to develop methods for energy-efficient and fast manipulation of the magnetic order parameter. Actuation methods, such as voltage control of magnetic anisotropy and coercivity, use of magnetoelectric and exchange coupling in multiferroic-ferromagnetic heterostructures, and charge-carrier-density-mediated ferromagnetism control, have been investigated [9,10]. Yet, these effects are generally weak at room temperature, which limits their practical use. For example, full 180° reversal of a ferromagnet via the magnetoelectric effect requires the assistance of electric currents or magnetic fields or can be accomplished with the resonant pulsed switching mode, which requires precise pulse timing [11–14]. Magnetoelastic effects that are used to tune the magnetic properties of thin films via epitaxial strain or piezoelectric substrates are generally observed in lowaspect-ratio nanomagnets [15,16]. However, in highaspect-ratio nanomagnets it is difficult to use strain effects to tune the magnetic properties.

<sup>\*</sup>shaloo.rakheja@nyu.edu

A promising research direction is "topological spintronics," which has been driven by the demonstration of efficient room-temperature spin-charge conversion in heterostructures with a topological insulator (TI) interfacing with a ferromagnetic metal [17]. The key property responsible for this advance is the combination of large spinorbit-coupling (SOC) strength and time-reversal symmetry that leads to the formation of helical Dirac surface states possessing an inherent spin-momentum locking [18–21]. The distinctive feature of TIs is that even without carriers near the chemical potential in the bulk, the spin Hall conductivity can be finite and significantly larger than that of heavy metals such as Pd, Pt, and W [22–24].

Here we use electric fields across a TI, resulting in coherent transport of spins across the material, to generate a spin torque on the magnetization of an adjacent magnetic insulator (MI) layer [25]. Unlike for a ferromagnetic metal, there is no shunting of electric current in the MI layer and current is restricted to flow on the surface of the TI layer. Furthermore, the MI can induce a gap in the TI surface states, rendering the TI surface state insulating, which is (counterintuitively) beneficial for the device operation. The spin-based device, a voltagecontrolled topological spin switch (VTOPSS), decouples the elements of a magnetoelectric material [26], allowing us to simultaneously optimize the choice of both TI and MI materials, thereby enabling ultralow-energy computing. One of the most important properties of the MI layer is its low damping [27,28], which is highly desirable in device applications where switching is realized through magnetization precession, as in a VTOPSS.

The transduction principle of charge current to magnetization to charge current is an old one, used prominently by Datta and Das [29] in their first spin-FET proposal, and also in later spin FETs, such as the device proposed by Hall and Flatté [30] in 2006. A similar transduction principle is also used in the charge-spin logic (CSL) device proposed by Datta et al. [31,32] in 2012. Unlike prior device proposals, the VTOPSS is purely electric field driven such that charge-to-magnetization transduction is accomplished without the flow of electric current. The magnetoelectric effect in the VTOPSS allows efficient charge-spin conversion at the TI-MI interface solely by application of an electric field across the TI. Given that a TI possesses a large spin Hall conductivity even without carriers near the chemical potential in its bulk, it avoids the dissipative charge currents present in heavy-metal layers [33] used as spin generators in the CSL device. Although both devices use a magnetic tunnel junction (MTJ) to read the magnetization information, in the VTOPSS the free layer of the MTJ is exchange-coupled to the MI layer, making the read process more robust against the effects of thermal noise.

The remainder of this paper is organized as follows. In Sec. II, the physics of operation of the VTOPSS is presented. In Sec. III, analytic models of performance metrics of the VTOPSS are presented, followed by benchmarking results against existing spin- and charge-based devices in Sec. IV. In Sec. V, implementation of universal Boolean logic gates and the logical efficiency of the VTOPSS resulting from its innate polymorphism are highlighted. Section V summarizes the key findings of this work while also offering an outlook on future research directions.

### **II. PHYSICS OF OPERATION**

The evolution of the wave functions of the full bands of the TI, under the influence of an electric field, produces coherent transport of spins across the material [34], which can be used to efficiently manipulate the magnetization state of an adjacent magnetic layer [35]. The charge Hall conductivity and bulk dissipative charge currents vanish or are small in the TI, but the spin Hall conductivity is finite and can be much larger than that of a large-SOC metal [36]. A TI has the highest efficiency of conversion of the electric field to spin torque yet observed at room temperature [37,38]. Hybrid TI-MI structures decouple the constituent features of a multiferroic material, allowing independent optimization of both components of the response of magnetization to an electric field (i.e., generation of spin torque from the electric field and response of the magnetic moments to the spin torque).

The total Berry curvature of a full band measures the integrated correlation between spin and orbital degrees of freedom. For a so-called trivial insulator this correlation integrates to zero across the entire full band. Thus, if at one region of the Brillouin zone the wave functions of the band have spin and orbit correlated preferentially parallel, there will be another region of the zone in which the wave functions are correlated preferentially antiparallel. An example is the valence band in a trivial direct-gap semiconductor, such as GaAs, which is of p-orbital character, for which the wave functions near the valence maximum are heavyhole states, with spin and orbit degrees of freedom parallel, whereas at energies below the split-off energy, the spin and orbit degrees of freedom are preferentially oriented antiparallel.

TIs differ from these trivial insulators in that this spinorbit correlation does not integrate to zero. The spin-orbit correlation is described quantitatively by the Berry curvature of the band, and thus the electronic ground state in a TI possesses a nonzero integrated Berry curvature. The spin Hall conductivity in the clean static limit, evaluated as the linear response of the spin current to an electric field by the Kubo approach, depends directly on the Berry curvature:

$$\sigma_{yx} = \frac{e\hbar}{V} \sum_{n} \sum_{k} f_{n\mathbf{k}} \Omega_{n\mathbf{k}}^{z}, \qquad (1)$$

where *e* is the elementary charge,  $\hbar$  is the reduced Planck constant, *V* is the volume of the system, **k** is the crystal momentum, *n* is a band index, and is  $\Omega_{n\mathbf{k}}^{z}$  is the Berry curvature,

$$\Omega_{n\mathbf{k}}^{z} = 2 \sum_{n \neq n'} \operatorname{Im} \frac{\left\langle u_{n\mathbf{k}} | j_{y}^{z} | u_{n'\mathbf{k}} \right\rangle \left\langle u_{n'\mathbf{k}} | v_{x} | u_{n\mathbf{k}} \right\rangle}{(E_{n\mathbf{k}} - E_{n'\mathbf{k}})^{2}}.$$
 (2)

Here the Fermi-Dirac function  $f_{n\mathbf{k}}$  ensures that the sum is over filled states, corresponding to all the filled bands at zero temperature. The spin-current and velocity operators,  $\hat{f}_i^j$  and  $\hat{v}_i$ , are

$$\hat{j}_i^j = \frac{\hbar}{4} (\hat{v}_i \sigma_j + \sigma_j \hat{v}_i), \quad \hbar \hat{v}_i = \nabla_{k_i} \hat{H}, \quad (3)$$

where  $\sigma_j$  is the spin operator along direction *j* and  $\hat{H}$  is the Hamiltonian of the material. The current and velocity operators are evaluated between the states with Bloch functions  $u_{n\mathbf{k}}$  and  $u_{n'\mathbf{k}}$  with energies  $E_{n\mathbf{k}}$  and  $E_{n'\mathbf{k}}$ , respectively.

As the integrated Berry curvature of the full band does not vanish for a TI, and the spin Hall conductivity is directly related to the total Berry curvature of the filled states of the TI, even without any carriers near the chemical potential in the bulk, the spin Hall conductivity does not vanish. This characteristic clearly identifies the spin current involved as nondissipative until it encounters other regions, such as an interface. Here we take advantage of this localized effect to drive the VTOPSS shown in Fig. 1. The device relies on the accumulation of spins at an interface, originating from the voltage  $(V_{in})$  applied to the TI. The spin Hall conductivity for a TI can be as large as (or larger than) that of a large-SOC metal, but the dissipative longitudinal charge current will vanish for the TI. Thus, a TI provides the advantages of a large spin Hall conductivity, but without the intrinsic dissipation of a metallic material. The resulting spin current produced by the electric field on the TI generates a torque on the spin in the magnetic material through exchange coupling or antidamping torque. In the case of effective exchange coupling, the torque forces the magnetization to precess and eventually reverse.

The spin current density created by application of an electric field  $E_{\text{TI}}$  is

$$J_s = \sigma_{yx} E_{\rm TI}.$$
 (4)

The resulting magnetization dynamics of the ferromagnetic insulator can be described by the Landau-Lifshitz-Gilbert-Sloncewski equation in a macrospin limit [39]:

$$\frac{1}{\gamma'}\frac{d\hat{\mathbf{m}}}{dt} = -\mu_0\hat{\mathbf{m}} \times \mathbf{H}_{\text{eff}} - \alpha\mu_0\hat{\mathbf{m}} \times \left(\hat{\mathbf{m}} \times \mathbf{H}_{\text{eff}}\right) -\underbrace{c_{\text{ex}}j_s\hat{\mathbf{m}} \times \hat{\mathbf{p}}}_{\text{Field like torque}} + \underbrace{j_s\hat{\mathbf{m}} \times \left(\hat{\mathbf{m}} \times \hat{\mathbf{p}}\right)}_{\text{Slonczewski torque}},$$
(5)



FIG. 1. Copy and invert functions implemented with the VTOPSS. In the write unit, an input voltage signal applied across the TI layer causes spin accumulation at the interface of the TI and MI layers, which exerts a spin torque on the magnetization of the MI layer to reverse it. The read unit has a MTJ exchangecoupled to the MI layer that allows reading of the information in the MI layer. The polarity of the output voltage can be changed on the fly by change of the polarity of the voltages  $V^+$  and  $V^-$  on the MTJ stack, allowing both inverting and noninverting logic to be realized with the same primitive/layout. Typical system materials include  $Bi_2Se_3/(Bi_xSb_{1-x})_2Te_3$  as the TI,  $Y_3Fe_5O_{12}$ ,  $(Ni_{0.65}Zn_{0.35})(Al_{0.2}Fe_{0.8})O_4$ , BaFe<sub>12</sub>O<sub>19</sub>, or Tm<sub>3</sub>Fe<sub>5</sub>O<sub>12</sub> as the MI, (Co,Fe)B-MgO-(Co,Fe)B/Ru/CoFe/IrMn (synthetic antiferromagnet) as the MTJ, and metallic or semiconducting nanointerconnects with effective resistivity less than  $100 \,\mu\Omega$  cm as wires. GND, ground.

where  $\hat{\mathbf{m}}$  is a unit vector in the magnetization direction,  $\gamma' = \gamma/(1 + \alpha^2)$ , where  $\gamma$  is the gyromagnetic ratio,  $\mu_0$ is the vacuum permeability, and  $\alpha$  is the Gilbert damping coefficient. The last two terms describe a spin torque from a spin current polarized in a direction  $\hat{\mathbf{p}}$ , generally perpendicular to the electric field and in the plane of the TI-MI interface.  $J_s = 2M_s t_{\text{MI}} j_s$ , where  $M_s$  is the magnetization of the MI layer and  $t_{\rm MI}$  is its thickness. (We assume a thin ferromagnetic insulator with area in contact with the topological insulator  $A_{int}$  and thickness  $t_{MI}$ .) The first spin-torque term describes the precession of the magnetization about the spin-polarization direction, with an exchange-coupling parameter  $c_{ex}$ . (For an estimate of this parameter, see Ref. [35].) This term is often referred to as a fieldlike interaction. The second spin-torque term characterizes the Slonczewski "antidamping" torque, a torque that can oppose the dissipative term (the second term on the right-hand side of the equation), leading to precessional magnetization dynamics and switching.

The effective field  $\mathbf{H}_{\text{eff}}$  characterizes the magnetic anisotropy of the free layer. For a uniaxial magnet with easy-magnetization direction in the *y* direction

$$\mathbf{H}_{\rm eff} = 2E_b m_v / (\mu_0 M_s V_{\rm MI}), \tag{6}$$

where  $E_b$  is the energy barrier to magnetization reversal and  $V_{\rm MI}$  is the volume of the MI layer. The magnetizationswitching mechanism depends on the orientation of the magnetic easy axis relative to the direction of spin polarization  $\hat{\mathbf{p}}$ . When the two are orthogonal, the switching can occur due to precession about the spin-polarization direction and can be very fast (less than 100 ps) [40-42]. However, typically precise electric pulse timing is required to ensure switching. When the spin polarization is collinear with the easy-axis direction, the switching is slower but the pulse time is not a critical parameter; in general, the write error rate decreases monotonically with either increasing pulse amplitude or increasing pulse duration [43]. The electric field polarity determines the sense of reversal (i.e., from  $m_v = 1$  to -1 and vice versa). The threshold spin current density for antidamping spincurrent switching follows from Eq. (5),  $J_{s,th} = 4\alpha E_b/A_{int}$ . The antidamping switching mechanism is considered in the analysis presented in Sec. III.

We note a key difference between the current in the VTOPSS and that in a STT device. In a STT device, the current, carried by individual carriers, is parallel to the applied electric field and Joule heating is produced. That contrasts with our device, in which the current flows perpendicular to the surface of the topological insulator (not along it) and is perpendicular to the applied electric field, making it nondissipative. The current itself in the bulk of the topological insulator is carried by the full band (not individual carriers), corresponding to a protected spin current similar to the quantized charge current in a quantum Hall state, which cannot scatter. Moreover, the current in the VTOPSS does not need to be on during the entire switching cycle since we are using the spin accumulation that results from the applied voltage to switch the device.

As shown in Fig. 1, the readout in the VTOPSS is accomplished by exchange coupling a small section of the MI layer (storing information) to the free layer of a MTJ, which could operate with sub-100-mV supply voltages ( $V^+$  or  $V^-$ ) to generate sufficient output voltage ( $V_{out}$ ) with intrinsic gain and the ability to fan out. This separates the robust information-storage aspect from the transduction within a hybrid magnetoelectric device, allowing one to probe the magnetization without disturbing the state.

While the device schematic in Fig. 1 shows the TI and the MTJ integrated laterally on the MI layer, the layers can be integrated vertically to achieve a smaller footprint of the device and higher integration density. The top view of the device for lateral and vertical integration schemes is shown in Fig. 2. In Sec. III, we highlight key differences in the scaling behavior of delay and energy dissipation with device dimensions for both layouts. For the purpose of performance benchmarking in Sec. IV, we consider only the vertically integrated VTOPSS.



FIG. 2. Vertically integrated layers (left) and laterally integrated layers (right) in the VTOPSS. The interface area in the vertical layout is  $A_{int} = W_{MI}L_{MI}$ , while that in the lateral layout is  $A_{int} = W_{TI}L_{TI}$ .

the metallic ferromagnetic layer, which is typically on the order of 1 ns. Spin-based devices using the spin Hall effect in heavy metals, such as Pt, Pd, and W, require large electric fields in the heavy metal to generate sufficient electric current to cause STT switching of nanomagnets. The VTOPSS takes advantage of the unique properties of TI and MI material systems to achieve the following criteria for energy-efficient logic applications: (i) nonvolatility of operational states, (ii) fully-voltage-driven switching of the MI magnetization with voltages less than 100 mV, (iii) absence of dissipative electric currents during the write process, and (iv) ultrafast switching of the MI magnetization due to its low Gilbert damping.

In this section, analytic models of latency and energy dissipation of the VTOPSS are presented, followed by a comparison of metrics against those of existing spin-based and charge-based devices. Analytic models are obtained for a uniaxial MI layer subjected to antidamping STT resulting from spin accumulation at the TI-MI interface when the TI is subjected to an external electric field. Micromagnetic effects in magnetization reversal, such as reversed domain nucleation and expansion in the MI laver. are not considered in this paper. Experiments have shown that such multidomain effects could lead to faster magnetization reversal, particularly in the absence of defects and pinning sites in the magnet [44,45]. The essential physics of magnetization reversal is determined by the transfer of angular momentum (the total angular momentum that needs to be switched and the rate at which angular momentum can be changed by the interactions present). As such, the macrospin model provides the correct order of magnitude for magnetization-switching thresholds and times even in the presence of nucleation and reversed domain expansion. Here we adopt the macrospin approximation, which affords analytic insight into the device performance limits and can be used readily for the optimization of device geometry and materials.

## **III. PERFORMANCE MODELING**

In most spin-based devices, the operating speed is limited by the time it takes to reverse the magnetization of

### A. Device latency

To estimate VTOPSS latency, the rate of spin accumulation at the TI-MI interface must be calculated. For a given electric field ( $E_{\text{TI}}$ ) and spin Hall conductivity ( $\sigma_{\text{SHC}}$ ) of the TI layer, the accumulation rate of interface spins is given as

$$\frac{dn_{\rm spins}}{dt} = \frac{\sigma_{\rm SHC}}{\hbar/2} E_{\rm TI} = \frac{\sigma_{\rm SHC}}{\hbar/2} \frac{V_{\rm in}}{W_{\rm TI}},\tag{7}$$

where  $V_{\rm in}$  is the voltage applied across the TI layer and  $W_{\rm TI}$  is the width of the TI layer, measured along the y axis in Fig. 1 and also marked in Fig. 2. For a given efficiency  $\varepsilon$  of coupling of spins at the TI-MI interface and the magnetic moment of the MI layer, the following condition is satisfied:

$$N_{\rm spins,MI} = \varepsilon n_{\rm spins} \mathcal{A}_{\rm int}, \tag{8}$$

where  $N_{\text{spins},\text{MI}}$  is the total number of spins in the MI layer subjected to spin torque due to the interface spin accumulation and  $A_{\text{int}}$  is the interface cross-section area. The total number of spins in a magnetic body is given as

$$N_{\rm spins,MI} = \frac{M_s V_{\rm MI}}{\mu_B} = \frac{2E_b}{\mu_B H_K},\tag{9}$$

where  $H_K$  is the anisotropy field of the MI layer and  $\mu_B = 9.3 \times 10^{-24}$  J/T is the Bohr magneton. Assuming antidamping switching of the MI layer in the ballistic limit  $(J_{\rm MI} \gg J_{\rm th})$ , the rate of spin accumulation at the interface will balance the rate of magnetization reversal of the MI layer. Here  $J_{\rm MI}$  is the input spin current density in the MI layer, while  $J_{\rm th}$  is the threshold spin current density required for STT-induced magnetization reversal. In this case, the reversal time of the MI layer is [44]

$$\tau_{\rm MI,0} = \frac{N_{\rm spins,MI}}{\varepsilon \mathcal{A}_{\rm int} dn_{\rm spins}/dt} = \frac{2eE_b}{\epsilon \mu_B H_K \sigma_0 V_{\rm in}} \frac{W_{\rm TI}}{\mathcal{A}_{\rm int}},\qquad(10)$$

where  $\sigma_{\text{SHC}} = \sigma_0(\hbar/2e)$ . The interface area depends on the device layout as indicated in the caption for Fig. 2. The STT-driven reversal of the MI layer will be subject to thermal fluctuations, which could result in switching errors. To account for switching errors ( $\mathcal{P}_{\text{err}}$ ), the switching delay of the MI layer is modified according to

$$\tau_{\rm MI} = \tau_{\rm MI,0} \eta_{\rm st}, \qquad (11a)$$

$$\eta_{\rm st} = -0.5 \ln \left( -\frac{\ln(1-\mathcal{P}_{\rm err})}{4E_b/k_B T} \right). \tag{11b}$$

For  $\mathcal{P}_{err} = 10^{-12}$  and  $E_b = 30k_BT$ , the stochastic parameter  $\eta_{st} \approx 16$ , which gives  $\tau_{MI} = 16\tau_{MI,0}$ . With new errortolerant computing paradigms related to approximate and probabilistic computing with spin-based devices, one may not need such low error rates [46–48]. For  $\mathcal{P}_{err} = 0.5$ ,  $\eta_{st} \sim$ 2.58. We choose  $\eta_{st} = 10$  and  $E_b = 30k_BT$  ( $\mathcal{P}_{err} = 10^{-7}$ ) for performance benchmarking in Sec. IV. These parameters ( $E_b = 30k_BT$  and  $\mathcal{P}_{err} = 10^{-7}$ ) allow us to have a fair and consistent comparison of the VTOPSS against existing spin-based devices, including all-spin logic (ASL), magnetoelectric spin-orbit (MESO) logic, and CSL devices, in which the energy barrier of the magnetic layer is approximately (5–40) $k_BT$  [31,49–51]. Issues pertaining to circuitand system-level error analyses are beyond the scope of this work.

Increases in  $\sigma_{SHC}$  of the TI layer and the spin-coupling efficiency are particularly beneficial toward reducing the device latency. While the total latency of the device must include the time needed to charge and discharge the device capacitance (sum of interconnect and TI input capacitance), the analysis presented in Sec. III E shows that the dominant time constant is due to the rate of spin accumulation at the TI-MI interface.

#### **B.** Thermal stability

The reversal delay of the MI layer can be reduced by lowering of the energy barrier  $E_b$ ; however, this comes at the cost of reduced thermal stability of the MI layer. The reversal time of a uniaxial magnet subjected only to thermal noise is given as [52,53]

$$\tau_{\text{stable}} = \frac{\sqrt{\pi} \left( \alpha + \alpha^{-1} \right)}{2\gamma H_K \sqrt{e_{b0}}} \left( 1 + \frac{1}{e_{b0}} + \frac{7}{4e_{b0}^2} \right) e^{e_{b0}}, \quad (12)$$

where  $e_{b0} = E_b/k_B T$  is the normalized energy barrier of the magnet. For  $\alpha = 3 \times 10^{-3}$ ,  $H_K = 0.1$  T, and  $e_{b0} =$ 30, we find  $\tau_{\text{stable}} = 3.39 \times 10^4 \text{ s}$  (0.392 days). The stability increases for  $\alpha = 10^{-4}$ , with  $\tau_{\text{stable}} = 1.0171 \times 10^{6}$  s (11.8 days). For most computing applications with clock frequency  $f_{\text{clock}} = 3$  GHz and activity factor  $\alpha_{\text{act}} = 10\%$ (1%), a device will switch on average once per 3.3 ns (33 ns). Over the device-relevant computational time  $(\Delta t)$ , the probability of error at a fixed  $\tau_{\text{stable}}$  is  $\mathcal{P}_{\text{err}} =$  $1 - \exp(-\Delta t / \tau_{\text{stable}})$ . For  $\Delta t = 33$  ns and  $\tau_{\text{stable}} = 3.39 \times$  $10^4$  s,  $\mathcal{P}_{err} = 9.73 \times 10^{-13}$ . As shown in Fig. 3, for  $e_{b0} =$ 30, the error rate stays well under  $10^{-10}$  for  $\Delta t = 100$  ns (corresponding to  $\alpha_{act} = 0.33\%$  at  $f_{clock} = 3$  GHz). These results quantitatively show that an energy barrier of  $30k_BT$ could provide sufficient thermal stability for most computing applications.

#### C. Minimum input voltage

The input voltage,  $V_{in}$ , across the TI is a critical parameter that affects the dynamics and the switching time of the VTOPSS. The model in Eq. (11) is valid only when  $J_{MI} > 2J_{th}$ . Therefore, an estimate of the minimum input voltage  $V_{in}^{min,ST}$  is found by considering  $J_{MI} = 2J_{th}$ , which



FIG. 3. Required energy barrier in a uniaxial magnet subjected to a random thermal field for a given probability of error within a time duration  $\Delta t$ .

results in

$$V_{\rm in}^{\rm min,ST} = \frac{8e\alpha E_b}{\hbar\sigma_0} \frac{W_{\rm TI}}{\mathcal{A}_{\rm int}}.$$
 (13)

From thermodynamic considerations, however, the minimum input voltage will be limited by the Johnson-Nyquist (JN) noise. Considering that the output node of the VTOPSS can be represented as a low-pass *RC* filter, the spectral density of the thermal noise is given as  $V_n^2 = 4k_BTR_{eq}$ , where  $R_{eq}$  is the equivalent resistance for charging and discharging the output capacitor,  $C_{out}$ . To find the total noise, we integrate the spectral density over the noise bandwidth  $\Delta f$ , which is given as

$$\Delta f = \frac{1}{2\pi} \int_0^\infty \frac{d\omega}{1 + \left(\omega R_{\rm eq} C_{\rm out}\right)^2} = \frac{1}{4R_{\rm eq} C_{\rm out}}.$$
 (14)

The total noise voltage across the output capacitor is therefore given as  $v_n^{\min,\text{JN}} = \sqrt{V_n^2 \Delta f} = \sqrt{k_B T / C_{\text{out}}}$ .

If we combine the limits due to spin-torque reversal and JN noise in the system, the minimum input voltage for the VTOPSS is given as

$$V_{\rm in}^{\rm min} = \max(V_{\rm in}^{\rm min,ST}, v_n^{\rm min,JN}).$$
(15)

For  $\alpha = 3 \times 10^{-3}$ ,  $E_b = 30k_BT$ ,  $W_{\text{TI}} = 100$  nm,  $\mathcal{A}_{\text{int}} = 50 \times 100$  nm<sup>2</sup>, and  $\sigma_0 = 2 \times 10^5$  ( $\Omega$  m)<sup>-1</sup>, we get  $V_{\text{in}}^{\text{min,ST}} \sim 0.5$  mV. For  $C_{\text{out}} \approx 20$  aF (sum of interconnect capacitance and capacitance of the fan-out TI layer),  $v_n^{\text{min,JN}} \sim 15$  mV. Therefore, in this case,  $V_{\text{in}}^{\text{min}} = v_n^{\text{min,JN}} = 15$  mV. For  $V_{\text{in}} = 100$  mV,  $\epsilon = 0.5$ , and  $\eta_{\text{st}} = 10$ ,  $\tau_{\text{MI}} = 860$  ps. The other parameters are the same as noted earlier. With larger spin Hall conductivity, we can expect the reversal delay of the VTOPSS to be well under 500 ps even for an error rate of  $10^{-7}$ .

#### D. Minimum read voltage

In the VTOPSS, the read unit is a MTJ stack coupled to the MI layer as depicted in Fig. 1. The read voltages are labeled as  $V^+$  and  $V^-$  and have the same magnitude but opposite polarity; that is,  $V^+ = V_{\text{read}}$  and  $V^- = -V_{\text{read}}$ , where  $V_{\text{read}}$  is the magnitude of the voltage applied to the MTJ to read the magnetization state of the MI layer. The supply voltages are clocked such that the writing and reading of a given logic stage happen in concurrent cycles. The voltage generated at the output node  $V_{\text{out}}$  of the *n*th stage drives the write unit of the (n + 1)th stage. The output voltage must meet the  $V_{\text{in}}^{\min}$  criterion in Eq. (15).

The equivalent-circuit model of the VTOPSS is shown in Fig. 4. The interconnect is modeled as a lumped RC network with  $R_{ic}$  and  $C_{ic}$  representing the total interconnect resistance and capacitance, respectively [54]. For a given interconnect length  $L_{ic}$ ,  $R_{ic} = r_{ic}L_{ic}$  and  $C_{ic} = c_{ic}L_{ic}$ , where  $r_{\rm ic}$  and  $c_{\rm ic}$  are the per-unit-length interconnect resistance and capacitance, respectively. The per-unit-length interconnect resistance  $r_{\rm ic} = \rho_{\rm eff} / A_{\rm ic}$ , where  $\rho_{\rm eff}$  is the effective resistivity of the wire and depends on the material as well as grain-boundary and sidewall carrier scatterings in interconnects with scaled cross-section area [55], while  $A_{ic} = 2W_{ic}^2$  is the cross-section area of the interconnect assuming that the interconnect thickness is twice the interconnect width,  $W_{ic}$ . The conductances of the parallel and antiparallel configurations of the free layer and the fixed layer in the MTJ stack are denoted as  $G_{\rm P}$  and  $G_{\rm AP}$ , respectively. These conductances are typically determined from tunneling-magnetoresistance (TMR) and resistance-areaproduct (RA) measurements of the MTJ structure. TMR is given as  $(G_{\rm P} - G_{\rm AP})/G_{\rm P}$ , while the RA is given as  $\mathcal{A}_{\rm MTJ}/(G_{\rm P}+G_{\rm AP})$ , where  $\mathcal{A}_{\rm MTJ}$  is the cross-section area of the MTJ stack. For all results reported in this paper,  $A_{MTJ}$ =  $A_{int}$  unless otherwise specified.

The capacitance of the TI layer is  $C_{\text{TI}}$ , while the leakage of electric current through the TI is modeled with the leakage conductance  $G_{\text{TI}}$ . The TI capacitance is given as



FIG. 4. Equivalent electric circuit of the read unit of a stage driving the write unit of the following stage. The MTJ stack conductances are given as  $G_P$  (parallel) and  $G_{AP}$  (antiparallel). The total interconnect resistance and capacitance are  $R_{ic}$  and  $C_{ic}$ . The TI layer is modeled as a leaky capacitor with capacitance  $C_{TI}$  and shunt conductance  $G_{TI}$ . The MTJ read voltages have the same magnitude but opposite polarity.

 $C_{\text{TI}} = \epsilon_0 \epsilon_r \mathcal{A}_{\text{int}} / W$ , where  $\epsilon_0 = 8.85 \times 10^{-12}$  F/m and  $\epsilon_r$  is the static relative dielectric permittivity of the TI layer. For Bi<sub>2</sub>Se<sub>3</sub>,  $\epsilon_r \approx 110$  [56]. The leakage conductance  $G_{\text{TI}} = G_{\text{sheet}}L/W$ , where  $G_{\text{sheet}} = en_s\mu$  [57], where  $n_s$  and  $\mu$  correspond to the density and the effective mobility of surface carriers, respectively.

The leakage in the TI layer results from the conductance of topologically trivial and nontrivial surface states as well as the bulk conductivity resulting from unavoidable selfdoping effects [58]. Attempts to suppress bulk conductivity include thinning the TI layer until the surface contribution dominates or use of compensation doping to suppress free carriers in the bulk. For example, in the work reported in Ref. [59], copper doping was used in Bi<sub>2</sub>Se<sub>3</sub> films to fully suppress bulk states and decouple the surface states in samples as thin as 20 nm. A sheet resistance of approximately  $1000 \,\Omega/\Box$  at room temperature (300 K) was experimentally measured in a 20-nm-thick Bi<sub>2</sub>Se<sub>3</sub> film, while the sheet resistance increased to 1400 and 3000  $\Omega/\Box$  in films of thickness 10 and 2 nm, respectively, in the same sample. More recently, sheet resistances on the order of tens of kilo-ohms per square have been experimentally achieved at room temperature in 5-60-nm-thick Bi<sub>2</sub>Se<sub>3</sub> films grown on an insulating  $In_2Se_3/(Bi_{0.5}In_{0.5})_2Se_3$  buffer layer on sapphire substrates [60].

The time-domain response of the output voltage, obtained by our solving Kirchoff's laws in the circuit shown in Fig. 4, is given as

$$V_{\text{out}}(t) = V_f \left( 1 - e^{-\frac{t}{\tau_{\text{eq}}}} \right) + V_i e^{-\frac{t}{\tau_{\text{eq}}}}, \qquad (16a)$$

$$V_f = \frac{\Delta G}{G_t} \frac{V_{\text{read}}}{1 + R_{\text{int}}G_{\text{TI}}} = \nu V_{\text{read}}, \qquad (16b)$$

$$\tau_{\rm eq} = \underbrace{\frac{1 + R_{\rm int}G_t}{G_t \left(1 + R_{\rm int}G_{\rm TI}\right)}}_{R_{\rm eq}} C_{\rm out}, \tag{16c}$$

where  $\Delta G = G_{\rm P} - G_{\rm AP}$ ,  $G_t = G_{\rm P} + G_{\rm AP}$ , and  $V_f$  and  $V_i$ are the final and initial voltages, respectively, at the output node. At the end of the read or write cycle, the voltage  $V_{\rm out}$ is reset to 0 V. Therefore, for all results presented in this paper,  $V_i = 0$  V. The minimum read voltage required on the MTJ stack to ensure correct functionality is obtained by our equating Eqs. (15) and (16). Assuming that the read pulse duration is significantly greater than  $\tau_{\rm eq}$ ,  $V_{\rm read}^{\rm min}$ is given as

$$V_{\text{read}}^{\min} = (1 + R_{\text{int}}G_{\text{TI}}) \frac{G_t}{\Delta G} V_{\text{in}}^{\min}.$$
 (17)

In Fig. 5, the ratio of the minimum read voltage to the minimum input voltage is plotted as a function of the TMR of the MTJ in the VTOPSS. The values of the TMR and the *RA* are taken from Ref. [61] and are reproduced in the inset



FIG. 5. Ratio of the minimum read voltage to the minimum input voltage as a function of the TMR. The left inset shows the scaling of the TMR with the *RA* (experimental data taken from Ref. [61]). The right inset shows the impact of the TMR on the total conductance ( $G_t$ ) and the differential conductance ( $\Delta G$ ) of the MTJ for two values of the cross-section area,  $\mathcal{A}_{MTJ} = 30 \times 30 \text{ nm}^2$  (solid line) and 50 × 50 nm<sup>2</sup> (dashed line). Increasing the MTJ cross-section area increases all conductance values without affecting the ratio of the minimum read voltage and the minimum input voltage.

in Fig. 5. As the *RA* increases, the TMR increases in proportion and saturates at approximately 350% for *RA* greater than 1000  $\Omega \mu m^2$ . The minimum read voltage needed to reliably read the state of the VTOPSS is approximately  $2V_{\text{in}}^{\min}$  for TMR in excess of 200% (dotted line in the main plot). Assuming  $V_{\text{in}}^{\min} \approx 15 \text{ mV}$  (JN limit),  $V_{\text{read}}^{\min} \approx 30 \text{ mV}$ . While the cross-section area of the MTJ does not affect the  $V_{\text{read}}^{\min}/V_{\text{in}}^{\min}$  ratio, the values of  $G_t$  and  $\Delta G$  are affected significantly by the MTJ cross-section area as shown in the inset on the right-hand side in Fig. 5.

### E. Energy dissipation

The total energy dissipation consists of the energy required to charge and discharge the output node voltage,  $V_{out}$ , and the direct path conduction between  $V^+$  and  $V^-$ . Assuming that the read phase lasts for time  $\tau_{pulse}$ , the energy supplied by the voltage  $V^+$  is given by the following integral:

$$E_{\text{read}} = \int_0^{\tau_{\text{pulse}}} dt I_1(t) V^+ = \int_0^{\tau_{\text{pulse}}} dt [I_2(t) + I_3(t)] V^+,$$
(18)

where  $I_j(t)$  (j = 1,2,3) denotes the electric current flowing in the *j*th branches as shown in Fig. 4,  $I_2(t) = G_{AP}[V_1(t) - V^-]$ , and  $I_3(t) = C_{out}dV_{out}(t)/dt + G_{TI}V_{out}(t)$ , where  $C_{out} = C_{TI} + C_{int}$  is the net capacitive loading at the output node. By our substituting  $V_1(t)$  in terms of  $V_{out}(t)$  and using Eq. (16), the energy dissipation of the circuit is given by Eq. (19):

$$E_{\text{read}} = \underbrace{V_{\text{read}}^2 G_{\text{AP}} \tau_{\text{pulse}} \left[1 + \nu(1 + R_{\text{int}} G_{\text{TI}})\zeta\right]}_{E_{\text{read},1}} + \underbrace{\nu V_{\text{read}}^2 G_{\text{TI}} \tau_{\text{pulse}}\zeta}_{E_{\text{read},2}} + \underbrace{\nu V_{\text{read}}^2 G_{\text{out}} \left(1 + G_{\text{AP}} R_{\text{int}}\right) \left(1 - e^{-\tau_{\text{pulse}}/\tau_{\text{eq}}}\right)}_{E_{\text{read},3}}, (19)$$

where

$$\zeta = \left[1 - \frac{\tau_{eq}}{\tau_{pulse}} + \frac{\tau_{eq}}{\tau_{pulse}} \exp\left(-\frac{\tau_{eq}}{\tau_{pulse}}\right)\right].$$

The term  $E_{\text{read},1}$  in Eq. (19) is dominated by the energy dissipation due to the MTJ leakage. The second term,  $E_{\text{read},2}$ , is due to electric conduction through the TI, while the third term,  $E_{\text{read},3}$ , is due to the energy consumed in charging and discharging the output-node capacitance.

To reduce the leakage through the MTJ,  $G_t$  should be lower for a fixed TMR. Unfortunately, this will increase  $\tau_{eq}$  and therefore the net delay of the VTOPSS. For example, for  $r_{int} = 1.25 \times 10^7 \ \Omega/m$ ,  $L_{int} = 100 \ nm$ , TMR of 222%, RA of 4.6 × 10<sup>-11</sup>  $\Omega$  m<sup>2</sup>,  $A_{MTJ} = 50 \times 50 \ nm^2$ , and  $C_{out} = 0.1$  fF,  $\tau_{eq} \approx 1.85$  ps. For TMR of 350% and RA of 2.2 × 10<sup>-9</sup>  $\Omega$  m<sup>2</sup>,  $\tau_{eq}$  increases to approximately 89 ps. In the latter case, the charging time of the output capacitance will be comparable to the reversal delay of the MI layer and cannot be ignored. The parameter  $\zeta$  in Eq. (19) will reduce below unity when  $\tau_{eq} \sim \tau_{MI}$  (see the discussion in Sec. IV A).

## F. Dimensional scaling

The scaling of the performance metrics of the device with its dimensions depends on the interface area,  $A_{int}$ , which depends on the integration scheme adopted for the device (see layout options in Fig. 2.) Table I shows the scaling of the MI reversal delay and the energy dissipation due to leakage for vertically integrated (option A) and laterally integrated (option B) device layouts. We assume

that even as the cross-section dimensions of the MI layer are scaled, the energy barrier is fixed so that the thermal stability of the MI layer is not compromised with scaling. For a fixed value of the uniaxial energy density, this can be achieved by appropriate variation of the thickness of the MI layer. Table I shows that for both layouts the speed of the writing mechanism in the VTOPSS scales inversely with a relevant length scale. This length scale is the length of the MI layer (easy axis) for option A, in which layers are stacked vertically. In option B, the relevant length scale is equal to the length of the TI layer. Likewise, the energy dissipation scales inversely with the width of the MI layer for option A and with the width of the TI layer for option B. Even though for both layouts the performance (delay and energy) scales inversely with an appropriate dimension, option B would be preferred due to its smaller footprint. In this layout the spin torque acts nearly uniformly across the cross-section area of the MI layer, promoting (or leading to) a more-uniform (or more-coherent) magnetization reversal.

#### **IV. PERFORMANCE BENCHMARKING**

To benchmark the performance of the VTOPSS against CMOS and existing spin-based devices, we first study the impact of device design on VTOPSS latency, energy, and EDP. Results are reported only for the vertically integrated device layout. For all results,  $\tau_{pulse} = \tau_{MI} + \tau_{eq}$ . Simulation parameters for all benchmarks are listed in Table II unless otherwise noted. For CMOS logic at the 2020 International Technology Roadmap for Semiconductors (ITRS) technology node, the effect of local interconnects (copper low- $\kappa$ ) on the performance metrics is considered.

### A. Latency and energy dissipation

In Fig. 6 the reversal delay of the MI layer and the total delay of the VTOPSS are plotted versus the read voltage on the MTJ. We assume a stochasticity parameter  $\eta_{st} = 10$  to guarantee a switching error rate as low as  $10^{-7}$ . Our results show that sub-500-ps reversal delay of the MI layer can be achieved with  $V_{read} \gtrsim 150$  mV for  $\sigma_{SHC} = 2 \times 10^5$   $(\Omega \text{ m})^{-1}$  and spin-coupling efficiency of 100%. A major reduction in delay results from the use of a TI material with a larger spin Hall conductivity. At a read voltage of 150 mV and  $\sigma_{SHC} = 5 \times 10^5 (\Omega \text{ m})^{-1}$ , the MI reversal

TABLE I. Performance scaling for the device layouts depicted in Fig. 13. In option A, layers are vertically stacked, while in option B, layers are placed laterally. The energy dissipation due to the MTJ leakage scales as  $E_{\text{MTJ}} \propto A_{\text{MTJ}} / \min(L_{\text{MI}}, L_{\text{TI}})$ .

| Layout   | Interface area                                  | Delay ( $E_b$ fixed)                                                            | Energy (TI leakage)                                                                                   |
|----------|-------------------------------------------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|
| Option A | $\mathcal{A}_{\rm int} = W_{\rm MI} L_{\rm MI}$ | $	au_{ m MI} \sim rac{1}{L_{ m MI}} \left( rac{W_{ m TI}}{W_{ m MI}}  ight)$  | $E_{\mathrm{TI}} \sim rac{1}{W_{\mathrm{MI}}} \left( rac{L_{\mathrm{TI}}}{L_{\mathrm{MI}}} \right)$ |
|          |                                                 | $	au_{ m MI} \sim rac{1}{L_{ m MI}}$ if $W_{ m TI}$ and $W_{ m MI}$ are scaled | $E_{\rm TI} \sim \frac{1}{W_{\rm MI}}$ if $L_{\rm TI}$ and $L_{\rm MI}$ are scaled                    |
|          |                                                 | proportionately                                                                 | proportionately                                                                                       |
| Option B | $\mathcal{A}_{\rm int} = W_{\rm TI} L_{\rm TI}$ | $	au_{ m MI} \sim rac{1}{L_{ m TI}}$                                           | $E_{ m TI}\simrac{1}{W_{ m TI}}$                                                                     |

| Parameter                                                         | Value                                             |  |  |  |  |  |
|-------------------------------------------------------------------|---------------------------------------------------|--|--|--|--|--|
| Magnetic insulator                                                |                                                   |  |  |  |  |  |
| Energy barrier, $E_b$                                             | $30k_BT$                                          |  |  |  |  |  |
| Uniaxial magnetic anisotropy field, $H_K$                         | 0.1 T                                             |  |  |  |  |  |
| Gilbert damping coefficient, $\alpha$                             | $3 \times 10^{-3}$                                |  |  |  |  |  |
| Stochasticity parameter, $\eta_{st}$                              | 10 (300 K); 2.6 (0 K)                             |  |  |  |  |  |
| Width of the MI layer, $W_{\rm MI}$                               | 50 nm                                             |  |  |  |  |  |
| Length of the MI layer, $L_{\rm MI}$                              | 100 nm                                            |  |  |  |  |  |
| Topological insulator                                             |                                                   |  |  |  |  |  |
| Spin Hall conductivity, $\sigma_{SHC}$                            | $5 \times 10^5 \hbar/2e (\Omega \mathrm{m})^{-1}$ |  |  |  |  |  |
| Sheet resistance of the TI, $R_{\text{sheet}}$                    | $100 \text{ k}\Omega/\Box$                        |  |  |  |  |  |
| Relative dielectric permittivity of the TI, $\epsilon_r$          | 110                                               |  |  |  |  |  |
| Efficiency of spin coupling at the TI-MI interface, $\varepsilon$ | 1.0                                               |  |  |  |  |  |
| Width of the TI layer, $W_{\text{TI}}$                            | 100 nm                                            |  |  |  |  |  |
| Length of the TI layer, $L_{\text{TI}}$                           | 100 nm                                            |  |  |  |  |  |
| Interconnect and magnetic tunnel ju                               | unction                                           |  |  |  |  |  |
| Effective resistivity of the interconnect, $\rho_{\text{eff}}$    | $2.5 \times 10^{-6} \ \Omega \mathrm{m}$          |  |  |  |  |  |
| Capacitance per unit length of the interconnect, $c_{ic}$         | 1.6 pF/cm                                         |  |  |  |  |  |
| Interconnect length, <i>L</i> <sub>ic</sub>                       | 100 nm                                            |  |  |  |  |  |
| Interconnect width, $W_{ic}$                                      | 50 nm                                             |  |  |  |  |  |
| Interconnect thickness, $t_{ic} = 2W_{ic}$ (aspect ratio 2)       | 100 nm                                            |  |  |  |  |  |
| Resistance-area product of the MTJ                                | $2.2 	imes 10^{-9} \ \Omega \ \mathrm{m}^2$       |  |  |  |  |  |
| Tunneling magnetoresistance of the MTJ                            | 350%                                              |  |  |  |  |  |
| Area of the MTJ, $A_{\rm MTJ}$                                    | $50 \times 50 \text{ nm}^2$                       |  |  |  |  |  |

TABLE II. Material parameters used for conducting simulations and performance benchmarking considering a vertically integrated device layout.

delay is approximately 273 ps for 100% spin-coupling efficiency. The effect of spin-coupling efficiency on the delay is examined in the inset in Fig. 6 at  $V_{\text{read}} = 150 \text{ mV}$ . For a practical spin-coupling efficiency of 50%, the delay of the VTOPSS is approximately 445 ps. Further reduction



FIG. 6. Switching delay of the VTOPSS read voltage for various values of the spin Hall conductivity,  $\sigma_{SHC}$ , of the TI layer. We choose spin-coupling efficiency  $\varepsilon = 100\%$ . Other simulation parameters are given in Table II. The inset shows the impact of  $\varepsilon$ on the total delay of the device.

in delay can be achieved if the leakage through the MTJ stack is eliminated by fabrication of MTJs with a higher TMR. Finally, as shown in Table I, increasing the length of the MI layer can help reduce the delay further; however, this reduction comes at the cost of lower device integration density.

The effect of read voltage on the energy and EDP of the device is shown in Figs. 7(a) and 7(b). The total energy dissipation scales nearly quadratically with the read voltage. While  $E_{\text{read},3}$  scales as  $V_{\text{read}}^2$ , the product of  $\tau_{\text{pulse}}$  and  $\zeta$  is nearly independent of  $V_{\text{read}}$ , which also makes the dominant components of energy dissipation (i.e., due to the TI and MTJ leakage) scale as  $V_{\text{read}}^2$  [see Fig. 7(c)]. Unlike CMOS technology, where the relationship between energy dissipation and supply voltage is exactly quadratic, in the case of the VTOPSS, the relationship depends on the values of various material and geometrical parameters of the device. For  $\tau_{eq} \sim \tau_{pulse}$ ,  $E_{read}$  scales as  $V_{read}^2$ ; how-ever, the relationship changes to linear (i.e.,  $E_{read} \sim V_{read}$ ) for  $\tau_{eq} \ll \tau_{pulse}$ . At a read voltage of 150 mV, the energy dissipation of the VTOPSS for infinite sheet resistance (gapped surface states and negligible bulk conductivity) is as low as 3.5 aJ if the spin-coupling efficiency at the TI-MI interface is 100%. This energy dissipation increases by 2-3 orders of magnitude for Bi<sub>2</sub>Se<sub>3</sub> thin films with a sheet resistance of 1–10 k $\Omega/\Box$  measured at room temperature as reported in Ref. [59]. The EDP of the VTOPSS increases with increase of the MTJ read voltage. At 150 mV, the



FIG. 7. Effect of read voltage on (a) energy dissipation, (b) EDP, and (c)  $\tau_{eq}$  and  $\zeta$ .

EDP decreases from  $9.5 \times 10^{-26}$  Js at  $R_{\text{sheet}} = 10 \text{ k}\Omega/\Box$  to  $9.5 \times 10^{-28}$  Js when there is no leakage through the TI.

Figure 8 shows the contribution of various terms in Eq. (19) to the overall VTOPSS energy dissipation. For large TI conductance, corresponding to a large value of the sheet conductance of the TI, the net energy dissipation is limited by the TI leakage [ $E_{\text{read},2}$  in Eq. (19)]. Assuming that the TI conductance can be suppressed such that  $G_{\text{sheet}} \rightarrow 0$ , we see that the energy dissipation is limited by the MTJ leakage [ $E_{\text{read},1}$  in Eq. (19)] for low values of



FIG. 8. Energy dissipation versus sheet conductance of the TI for (a) large and (b) small *RA* and TMR at  $V_{\text{read}} = 150 \text{ mV}$ .

the TMR and *RA*, while for large values of the TMR and *RA*, both MTJ leakage and charging of the output capacitance will limit the total energy of the VTOPSS. The energy associated with charging or discharging outputnode capacitances remains well under 2 aJ for all cases considered here ( $C_{out} = 113$  aF), and is significant only when both  $G_{TI}$  and  $G_{AP}$  are low [Fig. 8(b)].

### **B. Effect of MTJ design on EDP**

One can reduce the EDP of the VTOPSS by designing junctions that exhibit a large TMR—an increase in TMR at a fixed *RA* reduces both the switching delay and the energy dissipation. As shown in Fig. 9(a), the scaling of the EDP with TMR in the VTOPSS can be expressed as  $\mathcal{P}_{\rm ED} \propto 1/T^b$ , where the value of the exponent *b*, typically in the range from -0.5 to -1, depends on the material and geometrical parameters of the device.

As shown in Fig. 9(b), the EDP initially decreases with an increase in the RA. With further increase in the RA, the EDP exhibits the reverse trend and begins to increase. For RA less than the optimal value, the EDP is inversely proportional to the RA. Beyond the optimal RA, unfortunately the delay associated with charging/discharging capacitive nodes becomes much larger than the MI reversal delay. The optimal RA depends on the material and geometry of the device. For the results shown in Fig. 9, the optimal RA decreases with a reduction in  $R_{\text{sheet}}$ . Moreover, the EDP-RA contour becomes flatter around the optimal point as  $R_{\rm sheet}$  reduces. The results show that at a TMR of 600% and without any leakage through the TI, the optimal EDP of the VTOPSS is around  $4 \times 10^{-28}$  J s at  $V_{\text{read}} = 50 \text{ mV}$ and  $8 \times 10^{-28}$  Js at  $V_{\text{read}} = 150$  mV. For the same parameters, the energy dissipation and the delay of the VTOPSS are 0.9 aJ (5 aJ) and 486 ps (192 ps), respectively, at  $V_{\text{read}} = 50 \text{ mV} (150 \text{ mV})$ . In typical MTJs, TMR increases with increasing RA, which can be harnessed to reduce the EDP of the VTOPSS. In Ref. [61], it was shown that a TMR of 600% can be obtained with a RA of  $10^4 \ \Omega \ \mu m^2$ in (Co.Fe)B/MgO/(Co.Fe)B-type MTJs by annealing the structure above 500 °C. For applications involving STT



FIG. 9. Effect of (a) TMR and (b) RA on the EDP of the VTOPSS at  $V_{\text{read}} = 150 \text{ mV}$ . For (b), TMR is 600%.

magnetic RAM, a large RA is undesirable as it increases the voltage required to switch the magnetization state via current-induced spin torques [62]. In the case of VTOPSS technology, the MTJ cell is used only to generate a rather low output voltage that must be sufficient to switch the subsequent logic stage. As such, a large RA of the read unit on the MTJ stack may be beneficial for the design of a VTOPSS.

### **C.** Interconnect considerations

To transmit information between VTOPSS logic, conventional CMOS-compatible metallic interconnects, such as copper low- $\kappa$  or aluminum, can be used. From examination of Eq. (16), the effect of interconnect resistance on the device performance appears in the time constant  $\tau_{eq}$  for charging and discharging the output-node capacitance. In Fig. 10(a), we study the effect of interconnect length on limiting the net delay of the VTOPSS. For very low read voltages, the MI time constant dominates, allowing us to ignore the interconnect impact for interconnects as long as a few micrometers [solid line in Fig. 10(a)]. However, as we increase the read voltage, allowing faster reversal

of the MI layer, interconnects as short as a few hundred nanometers become important in quantifying the performance metrics of the device. The maximum interconnect length that can be tolerated for  $\tau_{eq} \lesssim \tau_{MI}$  is given as

$$L_{\rm ic}^{\rm max} = \left(\frac{G_{\rm TI}\tau_{\rm MI}}{2c_{\rm ic}} - \frac{1}{2r_{\rm ic}G_t}\right) + \sqrt{\left(\frac{c_{\rm ic} - r_{\rm ic}G_tG_{\rm TI}\tau_{\rm MI}}{2r_{\rm ic}c_{\rm ic}G_t}\right)^2 + \frac{\tau_{\rm MI}}{r_{\rm ic}c_{\rm ic}}}.$$
 (20)

For  $L_{\rm ic} > L_{\rm ic}^{\rm max}$ ,  $\tau_{\rm eq}$  exceeds  $\tau_{\rm MI}$ . As shown in Fig. 10(b), as  $\rho_{\rm eff} \rightarrow 0$ , the time-constant  $\tau_{\rm eq} \rightarrow G_t^{-1}C_{\rm out}$ . For finite leakage through the TI, the maximum interconnect length is limited to approximately 1  $\mu$ m for  $\rho_{\rm eff}$  up to  $10^{-5} \Omega$  m. For negligible TI leakage ( $R_{\rm sheet} \rightarrow \infty$ ), the effect of the interconnect is largely suppressed, implying that highly resistive and long interconnects up to several hundreds of micrometers can be used in VTOPSS technology without negatively impacting the overall performance. As a result,



FIG. 10. (a) Impact of interconnect length on the time constant  $\tau_{eq}$ , which quantifies the charging or discharging rate of the output-node capacitance. (b) Maximum interconnect length that yields  $\tau_{eq} = \tau_{MI}$ . For interconnects longer than  $L_{ic}^{max}$ , the charging/discharging time of the output node exceeds the MI reversal delay.

there exists a wide range of interconnect options, such as copper, ultrascaled wires (wire width much less than the electron mean free path), and doped semiconducting wires, to design VTOPSS logic.

## D. Comparison against existing logic devices 1. CMOS metrics

The model used for computing the performance metrics of CMOS technology comprises a minimum-sized CMOS inverter driving a similarly sized load through a copper, low- $\kappa$  interconnect. With the Elmore delay model, the delay of the CMOS circuit is given as [63]

$$\tau_{\rm CMOS} = 0.69R_{\rm S} \left( C_S + C_L \right) + 0.69 \left( R_S c_{\rm ic} + r_{\rm ic} C_L \right) L_{\rm ic} + 0.38 r_{\rm ic} c_{\rm ic} L_{\rm ic}^2, \quad (21)$$

where  $R_S$  and  $C_S$  are the source resistance and capacitance, respectively, and  $C_L$  is the load resistance (assumed to be equal to  $C_S$ ). The energy dissipation of the CMOS circuit is given as

$$E_{\rm CMOS} = (C_S + C_L + c_{\rm ic}L_{\rm ic}) V_{\rm DD}^2,$$
 (22)

where  $V_{DD}$  is the supply voltage.

The CMOS-device metrics are taken from the ITRS for the 2020 technology node (half pitch of metal 1 of 18 nm). For a minimum-sized inverter,  $R_S \approx 78$  k $\Omega$ ,  $C_S =$ 0.68 fF/ $\mu$ m,  $C_L = 0.38$  fF/ $\mu$ m,  $\rho_{\rm eff} = 25$   $\mu\Omega$  cm,  $c_{\rm ic} =$ 1.6 pF/cm, and the interconnect aspect ratio is 2 [64]. The delay of the CMOS circuit resulting from omission of the interconnect-related delay is approximately 1.1 ps at an energy dissipation of 10 aJ per bit. This yields an EDP of  $1.1 \times 10^{-29}$  J s. For an interconnect length of 100 nm, the delay of the CMOS circuit is 2.1 ps at an energy dissipation of 20 aJ per bit and an EDP of  $4.2 \times 10^{-29}$  J s.

### 2. CMOS scaling

To study CMOS scaling over process nodes, we consider only the intrinsic delay of a CMOS transistor:  $\tau_{\rm CMOS} = C_{\rm par}V_{\rm DD}/I_{\rm DSAT}$  (drain saturation), where  $V_{\rm DD}$  is the supply voltage and  $C_{\rm par}$  and  $I_{\rm DSAT}$  are the parasitic capacitance and the maximum *on* current, respectively, of the transistor. At the same technology node, increasing the transistor width does not change its intrinsic delay as both  $C_{\rm par}$  and  $I_{\rm DSAT}$ are proportional to the device width. If we consider different technology nodes from ITRS, we see that the switching delay of CMOS transistors is roughly the same as the minimum feature size scales from 17.9 nm (year 2020) to 8.9 nm (year 2026), as shown in Fig. 11. This scaling is consistent with the fact that the on-chip clock frequencies have not increased over the last several years with scaling.

The intrinsic energy dissipation of CMOS transistors is given as  $E_{\text{CMOS}} = C_{\text{par}} V_{\text{DD}}^2$ . At the same technology node,



FIG. 11. Scaling of the delay and energy dissipation of a silicon transistor with technology year. The data are taken from the 2013 ITRS update.

increasing the transistor width will increase its energy dissipation due to the increase of the parasitic capacitance. This is contrary to the scaling of energy with dimensions in the VTOPSS technology (see Table I). To interpret  $E_{\rm CMOS}$ across technology nodes, we use ITRS projections for the years 2020–2026. ITRS data indicate that  $C_{\rm par} \sim W^{1.37}$  and  $V_{\rm DD} \sim W^{0.2}$  (*W* is the transistor width.) Therefore, as the CMOS technology scales, its intrinsic energy dissipation decreases. The results in Fig. 11 show this scaling behavior. However, the rate of decrease of energy dissipation tends to slow down towards the ITRS year 2024 with a minimum feature size of 11.3 nm.

### 3. Spin-based devices

Existing spin-based devices that are considered for comparison include ASL [65], CSL [66], and MESO logic [51] devices. ASL uses filtering of electric current through a nanomagnet to generate spin-polarized current, which communicates spin information between input-output nanomagnets via a nonmagnetic conductor (e.g., copper, aluminum) that serves as the interconnect. Unlike charge current, spin current is not conserved; therefore, the design of interconnects in ASL requires careful consideration [67]. On the other hand, CSL uses the spin Hall effect to convert electric current carrying information into spin-polarized current, which is used to switch the state of an input nanomagnet. The orientation of the input nanomagnet is communicated to an output nanomagnet via their mutual magnetic dipolar coupling. The magnetization state of the output nanomagnet is read through a MTJ, which generates an output electric current with the polarity and amplitude dependent on the orientation of the output nanomagnet and the voltage applied on the MTJ. Since information is communicated via electric current, there is no loss of information in the interconnect. However, because of the flow of electric current through a heavy-metal layer with a high effective resistivity, CSL has high Joule heating. The MESO logic device, recently proposed in Ref. [51], uses magnetoelectric transduction to convert electric current into spin current at the input side, while spin-orbit coupling is used at the output end for spin-to-charge transduction. That is, the input and output state variables are encoded in electric current. Benchmarking activities have shown that magnetoelectric mediated spin devices have energy dissipation comparable to that of CMOS devices [8].

Table III shows the performance metrics of the VTOPSS in comparison with spin-based devices. The performance of the VTOPSS exceeds that of existing spin-based devices. The performance metrics of the ASL and MESO logic devices in Table III do not include switching and write errors and can be considered to be representative of operation at 0 K. The performance metrics of the CSL device reported in Ref. [31] assume ideal dipolar coupling between the read and write magnets and ignore switching errors. The physics of magnetic dipolar coupling and its impact on the reliability of CSL device is reported in Refs. [68,69]. In Ref. [70] and noted in Table III, performance estimates of the CSL device are obtained in the presence of practical dipolar coupling with  $e_{b0} = 5$  and  $\mathcal{P}_{err} = 0.05$ . The energy dissipation of the VTOPSS is 100 times lower than that of ASL and CSL devices, while the delay of the VTOPSS is 2–10 times lower than that of ASL and CSL devices, respectively. The delay of the VTOPSS is comparable to that of the MESO logic device and can be reduced further by use of MI layers with lower damping and/or MI switching via precessional dynamics. In terms of energy dissipation, the VTOPSS performs slightly better than the MESO logic device. The energy dissipation can be further reduced through material optimization, particularly with a higher TMR and *RA* of the MTJ used for sensing the state of the MI layer in the VTOPSS.

## V. UNIVERSAL BOOLEAN LOGIC IMPLEMENTATION

A complete set of two-input Boolean functions can be implemented with use of the schematic shown in Fig. 12, where  $V_A$  and  $V_B$  refer to primary signal inputs and  $V_X$ denotes the tie-breaking input signal. To change the functionality between true and complementary outputs, the polarity of the supply voltage signals on the MTJ is swapped. To implement a NAND gate,  $V_X$  is set to its negative value, while for a NOR gate,  $V_X$  is set to its positive voltage. For XOR and XNOR functionality, one of the primary inputs is applied as a voltage signal on the TI, while

TABLE III. Overview of performance metrics of various spin-based devices. The performance metrics of the VTOPSS exceed those of existing spin-based technologies. The EDP of low-power CMOS technology at the 2020 ITRS technology node is approximately  $4 \times 10^{-29}$  Js (see the text for calculations). For ASL and MESO logic devices, the energy barrier of the magnet is  $40k_BT$  and the variation in their performance due to thermal noise is ignored; therefore, their performance is representative of results at 0 K. For the CSL device, the performance is reported at 95% accuracy of computation using magnets with an energy barrier of  $5k_BT$ . The physics of dipolar coupling is included according to Refs. [68,69]. For the VTOPSS, the write-error probability is  $10^{-7}$  at 300 K, while the thermal stability is  $3.4 \times 10^4$  s. The energy dissipation and EDP of the VTOPSS can be reduce by an order of magnitude by suppression of leakage through the TI.

| Metric                           | ASL [50]                                                                                               | CSL [70]                                                                              | MESO logic [51]                                                                       | VTOPSS (this work)                                                       |                                                                           |
|----------------------------------|--------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|--------------------------------------------------------------------------|---------------------------------------------------------------------------|
|                                  |                                                                                                        |                                                                                       |                                                                                       | $V_{\rm read} = 0.15 \text{ V}$<br>(300 K/0                              | read                                                                      |
| Input-output<br>Transduction     | Voltage<br>$V \rightarrow \mathbf{m} \rightarrow I_{\text{spin}} \rightarrow \mathbf{m} \rightarrow V$ | Electric current $I_{\text{elec}} \rightarrow \mathbf{m} \rightarrow I_{\text{elec}}$ | Electric current $I_{\text{elec}} \rightarrow \mathbf{m} \rightarrow I_{\text{elec}}$ | Voltage<br>$V \rightarrow \mathbf{m} \rightarrow V$                      |                                                                           |
| Energy per<br>bit                | 0.34 fJ <sup>a</sup>                                                                                   | 1.31 fJ                                                                               | 27 aJ                                                                                 | 38 aJ/16 aJ                                                              | 10 aJ/3.5 aJ                                                              |
| Switching delay                  | 0.5 ns                                                                                                 | 3 ns                                                                                  | 250 ps <sup>b</sup>                                                                   | 272 ps/145 ps                                                            | 617 ps/235 ps                                                             |
| Energy-<br>delay<br>product (Js) | $17 \times 10^{-26}$                                                                                   | $3.9 \times 10^{-24}$                                                                 | $6.75 \times 10^{-27}$                                                                | $10^{-26}/2.3 \times 10^{-27}$                                           | $\begin{array}{c} 6.3 \times 10^{-27} \\ 8.3 \times 10^{-28} \end{array}$ |
| Area                             | $3.8 \times 10^{-3} \ \mu \mathrm{m}^2$                                                                | $1.6 \times 10^{-3} \ \mu \mathrm{m}^2$                                               | $1.4 	imes 10^{-2} \ \mu \mathrm{m}^2$                                                | $(1-2) 	imes 10^{-3} \ \mu m^{2c}$<br>$(1-2) 	imes 10^{-2} \ \mu m^{2d}$ |                                                                           |
| Fan-out                          | No                                                                                                     | Yes                                                                                   | Yes                                                                                   | Yes                                                                      |                                                                           |

<sup>a</sup>Results for perpendicular-magnetic-anisotropy magnets.

<sup>b</sup>Total pulse width reported in Ref. [51].

<sup>c</sup>Conservative estimate for a vertically integrated device.

<sup>d</sup>Estimate reported for a laterally integrated device assuming the areas of the TI layer and the MTJ are  $100 \times 100 \text{ nm}^2$  and the spacing between the TI and the MTJ is 50 nm.



FIG. 12. All two-input Boolean logic functions can be implemented with the same device layout. The primary inputs are denoted as  $V_A$  and  $V_B$ , while the signal  $V_X$  denotes the tiebreaking signal to change the Boolean functionality. To switch between inverting and noninverting logic (different polarities of  $V_{\text{out}}$ ), the polarity of the signals  $V^+$  and  $V^-$  at the MTJ can be interchanged. GND labels show the ground contacts.

the other primary input serves as the supply voltage on the MTJ in the read unit. In the case of copy-invert functions, the tie-breaking signal  $V_X$  is set to 0 V. Alternately, the schematic shown in Fig. 1 can be used for copy and invert Boolean operations. However, by use of a generic layout as in Fig. 12, all 16 Boolean operations possible for two input signals can be implemented directly by permutation of the polarities of the MTJ supply and the control voltage. Another major advantage of the VTOPSS is its ability to support logic locking [71] and encryption at the device level by preventing optical-based reverseengineering attacks [72]. The innate polymorphism of the VTOPSS will enable runtime reconfigurability where the actual function to be implemented is determined on the fly with a key or control input. Exploration of the resilience of the VTOPSS against existing hardware attacks, prominently those based on the Boolean satisfiability test, will be investigated in future work.

The device layout corresponding to the universal logic gate is shown in Fig. 13, where the device area for a universal gate is approximately 0.06  $\mu$ m<sup>2</sup>, assuming relatively large values of the cross-section areas of the TI-MI interface, MTJ, and the interconnect. The area can be reduced significantly by the patterning of narrower TI

| Magnetic insulator — Free layer     Topological insulator — Barrier — Fixed layer |         |               |                                 |  |  |  |  |  |
|-----------------------------------------------------------------------------------|---------|---------------|---------------------------------|--|--|--|--|--|
| GND                                                                               | GND GND | GND 20 nm {   |                                 |  |  |  |  |  |
| $\boxtimes V_A$                                                                   |         | $\bigvee V_X$ | V <sup>-</sup> V <sub>out</sub> |  |  |  |  |  |

FIG. 13. Device layout for the schematic shown in Fig. 12. Here it is assumed that the cross-section area of the TI layer is  $100 \times 100 \text{ nm}^2$  and the spacing between adjacent TI layers is 50 nm. The MTJ cross-section area is the same as that of the TI-MI interface. The total area is approximately 0.06  $\mu$ m<sup>2</sup>. GND labels show the ground contacts.

and MI layers and reduction of the cross-section dimensions of the MTJ. The latter approach, in particular, is advantageous for reducing the device footprint without a negative impact on the device performance metrics.

## **VI. CONCLUSIONS**

Computational electronics can, in principle, be realized with any state variable that is stable over device-relevant timescales, and with any low-loss communication mechanism between devices that allows fan-out. In this regard, storing and manipulating information in magnetic materials is promising. Magnetic materials have a large number of electron spins that are locked together by their exchange interaction such that the reorientation energy per spin to move the magnetization collectively can be on the order of millielectronvolts. Magnetization reversal solely through electric fields is critical toward paving the path for an ultralow-energy computing substrate. The efficiency of voltage-spin conversion must be significantly higher to allow ultralow-voltage operation to be competitive with CMOS technology.

In this paper, a VTOPSS based on a hybrid TI-MI magnetoelectric structure is presented. The device has the following important features: (i) innate polymorphism (i.e., it can implement all 16 two-input Boolean operations using the same layout), (ii) CMOS compatibility (input and output variables are in voltage domain), (iii) extremely large intrinsic gain for charge-to-spin conversion owing to the ultrahigh spin Hall conductivity of the TI material, (iv) ability to support fan-out, (v) sub-10-mV operation with energy of less than 10 aJ per bit, (vi) ability to lower the EDP to on the order of  $10^{-29}$  Js (competitive with CMOS technology), (vii) elimination of electric-current-carrying wires as the operation is fully based on voltage-to-voltage conversion with transmission of information via capacitive charging and discharging of wires, amd (viii) ultralow damping of the MI layer allowing ultrafast operation on the order of a few hundred picoseconds via antidamping spin torque.

We develop analytic models to quantify the performance of the VTOPSS and benchmark the results against existing CMOS and spin-based devices. Our results conclusively show that for the current state-of-the-art material parameters, the VTOPSS exceeds the performance of all-spin logic, charge-spin logic, and magnetoelectric spin-orbit logic devices. Improvements in material parameters and device design can readily facilitate sub-attojouleenergy-per-bit operation with an energy-delay product of approximately  $10^{-29}$  J s for the VTOPSS to be competitive against CMOS devices at the 2020 ITRS technology node. Future work will address important issues pertinent to multidomain effects in both uniaxial and biaxial magnetic insulators and the effects of thermal stochasticity for subcritical excitation. Unlike CMOS devices, the VTOPSS can also provide logic locking due to the uniform device-level layout that makes it virtually impossible to probe the functionality with reverse-engineering hardware attacks. The ability of the VTOPSS to thwart state-of-theart Boolean satisfiability attacks is yet to be examined and will be considered in future work.

## ACKNOWLEDGMENTS

This work was supported partially by the MRSEC Program of the National Science Foundation under Award No. DMR-1420073. A.D.K. acknowledges support from Grant No. NSF-DMR-1610416. The authors thank Professor Nitin Samarth at Pennsylvania State University for useful discussions during the preparation of the manuscript. S.R. thanks Nikhil Rangarajan at New York University for his help in generating some of the graphics in the paper.

- Claude Chappert, Albert Fert, and Frédéric Nguyen Van Dau, The emergence of spin electronics in data storage, Nat. Mater. 6, 813 (2007).
- [2] Nicolas Locatelli, Vincent Cros, and Julie Grollier, Spintorque building blocks, Nat. Mater. **13**, 11 (2014).
- [3] Nickvash Kani and Azad Naeemi, Pipeline design in spintronic circuits, in 2014 International Symposium on Nanoscale Architectures (NANOARCH) (IEEE/ACM, 2014), pp. 110–115.
- [4] Yiming Huai, Spin-transfer torque MRAM (STT-MRAM): Challenges and prospects, AAPPS Bull. **18**, 33 (2008).
- [5] Luqiao Liu, Chi-Feng Pai, Y. Li, H. W. Tseng, D. C. Ralph, and R. A. Buhrman, Spin-torque switching with the giant spin Hall effect of tantalum, Science 336, 555 (2012).
- [6] J. A. Katine and Eric E. Fullerton, Device implications of spin-transfer torques, J. Magn. Magn. Mater. 320, 1217 (2008).
- [7] Dmitri E. Nikonov, George I. Bourianoff, Graham Rowlands, and Ilya N. Krivorotov, Strategies and tolerances of spin transfer torque switching, J. Appl. Phys. **107**, 113910 (2010).
- [8] Dmitri Nikonov and Ian Young, Benchmarking of beyond-CMOS exploratory devices for logic integrated circuits, IEEE J. Exploratory Solid-State Comput. Devices Circuits 1, 3 (2015).
- [9] Ying-Hao Chu, Lane W. Martin, Mikel B. Holcomb, Martin Gajek, Shu-Jen Han, Qing He, Nina Balke, Chan-Ho Yang, Donkoun Lee, Wei Hu, Wei Hu, Qian Zhan, Pei-Ling Yang, Arantxa Fraile-Rodríguez, Andreas Scholl, Shan X. Wang, and R. Ramesh, Electric-field control of local ferromagnetism using a magnetoelectric multiferroic, Nat. Mater. 7, 478 (2008).
- [10] Chun-Gang Duan, Julian P. Velev, Renat F. Sabirianov, Ziqiang Zhu, Junhao Chu, Sitaram S. Jaswal, and Evgeny Y. Tsymbal, Surface Magnetoelectric Effect in Ferromagnetic Metal Films, Phys. Rev. Lett. **101**, 137201 (2008).
- [11] Tao Wu, Alexandre Bur, Ping Zhao, Kotekar P. Mohanchandra, Kin Wong, Kang L. Wang, Christopher S.

Lynch, and Gregory P. Carman, Giant electric-field-induced reversible and permanent magnetization reorientation on magnetoelectric Ni/(011)[Pb(Mg<sub>1/3</sub>Nb<sub>2/3</sub>)O<sub>3</sub>]<sub>(1-x)</sub> – [PbTiO<sub>3</sub>]<sub>x</sub> heterostructure, Appl. Phys. Lett. **98**, 012504 (2011).

- [12] Jia-Mian Hu, Tiannan Yang, Jianjun Wang, Houbing Huang, Jinxing Zhang, Long-Qing Chen, and Ce-Wen Nan, Purely electric-field-driven perpendicular magnetization reversal, Nano Lett. 15, 616 (2015).
- [13] Ren-Ci Peng, Jia-Mian Hu, Kasra Momeni, Jian-Jun Wang, Long-Qing Chen, and Ce-Wen Nan, Fast 180 magnetization switching in a strain-mediated multiferroic heterostructure driven by a voltage, Sci. Rep. 6, 27561 (2016).
- [14] Nickvash Kani, John T. Heron, and Azad Naeemi, Strainmediated magnetization reversal through spin-transfer torque, IEEE. Trans. Magn. 53, 1 (2017).
- [15] Nicolas Tiercelin, Yannick Dusch, Alexey Klimov, Stefano Giordano, Vladimir Preobrazhensky, and Philippe Pernod, Room temperature magnetoelectric memory cell using stress-mediated magnetoelastic switching in nanostructured multilayers, Appl. Phys. Lett. 99, 192507 (2011).
- [16] Nicolas Tiercelin, Yannick Dusch, Vladimir Preobrazhensky, and Philippe Pernod, Magnetoelectric memory using orthogonal magnetization states and magnetoelastic switching, J. Appl. Phys. **109**, 07D726 (2011).
- [17] A. R. Mellnik, J. S. Lee, A. Richardella, J. L. Grab, P. J. Mintun, Mark H. Fischer, Abolhassan Vaezi, Aurelien Manchon, E.-A. Kim, and N. Samarth *et al.*, Spin-transfer torque generated by a topological insulator, Nature 511, 449 (2014).
- [18] A. A. Burkov and D. G. Hawthorn, Spin and Charge Transport on the Surface of a Topological Insulator, Phys. Rev. Lett. 105, 066802 (2010).
- [19] M. Z. Hasan and C. L. Kane, Colloquium, Rev. Mod. Phys. 82, 3045 (2010).
- [20] A. A. Burkov and D. G. Hawthorn, Spin and Charge Transport on the Surface of a Topological Insulator, Phys. Rev. Lett. **105**, 066802 (2010).
- [21] William Witczak-Krempa, Gang Chen, Yong Baek Kim, and Leon Balents, Correlated quantum phenomena in the strong spin-orbit regime, Annu. Rev. Condens. Matter Phys. 5, 57 (2014).
- [22] Guang-Yu Guo, Shuichi Murakami, T.-W. Chen, and Naoto Nagaosa, Intrinsic Spin Hall Effect in Platinum: First-Principles Calculations, Phys. Rev. Lett. 100, 096401 (2008).
- [23] Axel Hoffmann, Spin hall effects in metals, IEEE. Trans. Magn. 49, 5172 (2013).
- [24] Nguyen Huynh Duy Khang, Yugo Ueda, and Pham Nam Hai, A conductive topological insulator with large spin Hall effect for ultralow power spin-orbit torque switching, Nat. Mater. 17, 808 (2018).
- [25] Yang Lv, James Kally, Delin Zhang, Joon Sue Lee, Mahdi Jamali, Nitin Samarth, and Jian-Ping Wang, Unidirectional spin-Hall and Rashba- Edelstein magnetoresistance in topological insulator-ferromagnet layer heterostructures, Nat. Commun. 9, 111 (2018).
- [26] Manfred Fiebig, Revival of the magnetoelectric effect, Journal of Physics D: Applied Physics 38, R123 (2005).

- [27] Mingzhong Wu and Axel Hoffmann, *Recent Advances* in Magnetic Insulators-From Spintronics to Microwave Applications, (Academic Press, 2013), Vol. 64.
- [28] B. Heinrich, C. Burrowes, E. Montoya, B. Kardasz, E. Girt, Young-Yeal Song, Yiyan Sun, and Mingzhong Wu, Spin Pumping at the Magnetic Insulator (YIG)/Normal Metal (Au) Interfaces, Phys. Rev. Lett. 107, 066604 (2011).
- [29] Supriyo Datta and Biswajit Das, Electronic analog of the electro-optic modulator, Appl. Phys. Lett. 56, 665 (1990).
- [30] Kimberley C. Hall and Michael E. Flatté, Performance of a spin-based insulated gate field effect transistor, Appl. Phys. Lett. 88, 162503 (2006).
- [31] Supriyo Datta, Sayeef Salahuddin, and Behtash Behin-Aein, Non-volatile spin switch for Boolean and non-Boolean logic, Appl. Phys. Lett. 101, 252411 (2012).
- [32] Supriyo Datta, Vinh Quang Diep, and Behtash Behin-Aein, What constitutes a nanoswitch? A perspective, Emerg. Nanoelect. Devices 15 (2014).
- [33] Edurne Sagasta, Yasutomo Omori, Saül Vélez, Roger Llopis, Christopher Tollan, Andrey Chuvilin, Luis E. Hueso, Martin Gradhand, YoshiChika Otani, and Fèlix Casanova, Unveiling the mechanisms of the spin Hall effect in Ta, Phys. Rev. B 98, 060410 (2018).
- [34] Cüneyt Şahin and Michael E. Flatté, Tunable Giant Spin Hall Conductivities in a Strong Spin-Orbit Semimetal:  $Bi_{1-x}Sb_x$ , Phys. Rev. Lett. **114**, 107201 (2015).
- [35] Michael E. Flatté, Voltage-driven magnetization control in topological insulator/magnetic insulator heterostructures, AIP. Adv. 7, 055923 (2017).
- [36] William Witczak-Krempa, Gang Chen, Yong Baek Kim, and Leon Balents, Correlated quantum phenomena in the strong spin-orbit regime, Annu. Rev. Condens. Matter Phys. 5, 57 (2014).
- [37] A. R. Mellnik, J. S. Lee, A. Richardella, J. L. Grab, P. J. Mintun, M. H. Fischer, A. Vaezi, A. Manchon, E. A. Kim, N. Samarth, and D. C. Ralph, Spin-transfer torque generated by a topological insulator, Nature 511, 449 (2014).
- [38] Yabin Fan, Pramey Upadhyaya, Xufeng Kou, Murong Lang, So Takei, Zhenxing Wang, Jianshi Tang, Liang He, Li-Te Chang, Mohammad Montazeri, Guoqiang Yu, Wanjun Jiang, Tianxiao Nie, Robert N. Schwartz, Yaroslav Tserkovnyak, and Kang L. Wang, Magnetization switching through giant spin-orbit torque in a magnetically doped topological insulator heterostructure, Nature 13, 699 (2014).
- [39] Isaak D. Mayergoyz, Giorgio Bertotti, and Claudio Serpico, Nonlinear Magnetization Dynamics in Nanosystems (Elsevier, 2009).
- [40] A. D. Kent, B. Ozyilmaz, and E. delBarco, Spin-transferinduced precessional magnetization reversal, Appl. Phys. Lett. 84, 3897 (2004).
- [41] D. Pinna, C. A. Ryan, T. Ohki, and A. D. Kent, Reliable spin-transfer torque driven precessional magnetization reversal with an adiabatically decaying pulse, Phys. Rev. B 93, 184412 (2016).
- [42] G. E. Rowlands, C. A. Ryan, L. Ye, L. Rehm, D. Pinna, A. D. Kent, and T. A. Ohki, A cryogenic spin-torque memory

element with precessional magnetization dynamics, Sci. Rep. 9, 803 (2019).

- [43] H. Liu, D. Bedau, J. Z. Sun, S. Mangin, E. E. Fullerton, J. A. Katine, and A. D. Kent, Dynamics of spin torque switching in all-perpendicular spin valve nanopillars, J. Magn. Magn. Mater. 358, 233 (2014).
- [44] D. Bedau, H. Liu, J. Z. Sun, J. A. Katine, E. E. Fullerton, S. Mangin, and A. D. Kent, Spin-transfer pulse switching: From the dynamic to the thermally activated regime, Appl. Phys. Lett. 97, 262502 (2010).
- [45] J. J. Nowak, R. P. Robertazzi, J. Z. Sun, G. Hu, David W. Abraham, P. L. Trouilloud, S. Brown, M. C. Gaidis, E. J. O'Sullivan, and W. J. Gallagher *et al.*, Demonstration of ultralow bit error rates for spin-torque magnetic randomaccess memory with perpendicular magnetic anisotropy, IEEE Magn. Lett. 2, 3000204 (2011).
- [46] Rangharajan Venkatesan, Swagath Venkataramani, Xuanyao Fong, Kaushik Roy, and Anand Raghunathan, Spintastic: Spin-based stochastic logic for energy-efficient computing, in Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition (EDA Consortium, 2015), pp. 1575–1578.
- [47] Fabian Oboril, Azadeh Shirvanian, and Mehdi Tahoori, Fault tolerant approximate computing using emerging nonvolatile spintronic memories, in 2016 IEEE 34th VLSI Test Symposium (VTS) (IEEE, Las Vegas, NV, USA, 2016), pp. 1–1.
- [48] Ameya D. Patil, Sasikanth Manipatruni, Dmitri Nikonov, Ian A. Young, and Naresh R. Shanbhag, Shannoninspired statistical computing to enable spintronics, arXiv:1702.06119 (2017).
- [49] Wang Kang, Zhaohao Wang, Youguang Zhang, Jacques-Olivier Klein, Weifeng Lv, and Weisheng Zhao, Spintronic logic design methodology based on spin Hall effect–driven magnetic tunnel junctions, J. Phys. D: Appl. Phys. 49, 065008 (2016).
- [50] Sasikanth Manipatruni, Dmitri E. Nikonov, and Ian A. Young, Material Targets for Scaling All-Spin Logic, Phys. Rev. Appl. 5, 014002 (2016).
- [51] Sasikanth Manipatruni, Dmitri E. Nikonov, Chia-Ching Lin, Tanay A. Gosavi, Huichu Liu, Bhagwati Prasad, Yen-Lin Huang, Everton Bonturim, Ramamoorthy Ramesh, and Ian A. Young, Scalable energy-efficient magnetoelectric spin–orbit logic, Nature 565, 35 (2019).
- [52] William T. Coffey and Yuri P. Kalmykov, Thermal fluctuations of magnetic nanoparticles: Fifty years after Brown, J. Appl. Phys. **112**, 121301 (2012).
- [53] Arun Parthasarathy and Shaloo Rakheja, Reversal time of jump-noise magnetization dynamics in nanomagnets via Monte Carlo simulations, J. Appl. Phys. 123, 223901 (2018).
- [54] While a lumped interconnect model provides a pessimistic value of interconnect latency, we choose this model for its simplicity and ease of analytic calculations. Moreover, for the VTOPSS, interconnect latency is significantly smaller than that associated with spin accumulation and reversal of the MI layer. Therefore, the error introduced by a lumped model is negligible.
- [55] A. F. Mayadas and M. Shatzkes, Electrical-resistivity model for polycrystalline films: The case of arbitrary reflection at external surfaces, Phys. Rev. B 1, 1382 (1970).

- [56] R. Clasen, P. Grosse, A. Krost, F. Levy, S. F. Marenkin, W. Richter, N. Ringelstein, R. Schmechel, G. Weiser, Hea Werheit *et al.*, Non-tetrahedrally bonded elements and binary compounds I, Landolt-Bornstein, New Series, Group III/17 (1998).
- [57] Matthew Brahlek, Nikesh Koirala, Namrata Bansal, and Seongshik Oh, Transport properties of topological insulators: Band bending, bulk metal-to-insulator transition, and weak anti-localization, Solid State Commun. 215, 54 (2015).
- [58] E. K. de Vries, S. Pezzini, M. J. Meijer, N. Koirala, M. Salehi, J. Moon, S. Oh, S. Wiedmann, Tamalika Banerjee, Coexistence of bulk and surface states probed by Shubnikov–de Haas oscillations in Bi<sub>2</sub>Se<sub>3</sub> with high charge-carrier density, Phys. Rev. B **96**, 045433 (2017).
- [59] Matthew Brahlek, Nikesh Koirala, Maryam Salehi, Namrata Bansal, and Seongshik Oh, Emergence of Decoupled Surface Transport Channels in Bulk Insulating Bi<sub>2</sub>Se<sub>3</sub> Thin Films, Phys. Rev. Lett. **113**, 026801 (2014).
- [60] Maryam Salehi, Hassan Shapourian, Nikesh Koirala, Matthew J. Brahlek, Jisoo Moon, and Seongshik Oh, Finite-size and composition-driven topological phase transition in  $(Bi_{1-x}In_x)_2Se_3$  thin films, Nano Lett. **16**, 5528 (2016).
- [61] Shoji Ikeda, Jun Hayakawa, Young Min Lee, Ryutaro Sasaki, Toshiyasu Meguro, Fumihiro Matsukura, and Hideo Ohno, Dependence of tunnel magnetoresistance in MgO based magnetic tunnel junctions on Ar pressure during MgO sputtering, Jpn. J. Appl. Phys. 44, L1442 (2005).
- [62] Subho Chatterjee, Mitchelle Rasquinha, Sudhakar Yalamanchili, and Saibal Mukhopadhyay, A scalable design methodology for energy minimization of STTRAM: A circuit and architecture perspective, IEEE Trans. Very Large Scale Integration (VLSI) Syst. 19, 809 (2011).

- [63] H. B. Bakoglu and James D. Meindl, Optimal interconnection circuits for vlsi, IEEE. Trans. Electron Devices 32, 903 (1985).
- [64] Linda Wilson, International technology roadmap for semiconductors (ITRS), Semicond. Industry Assoc. (2013).
- [65] Behtash Behin-Aein, Deepanjan Datta, Sayeef Salahuddin, and Supriyo Datta, Proposal for an all-spin logic device with built-in memory, Nat. Nanotechnol. 5, 266 (2010).
- [66] Supriyo Datta, Sayeef Salahuddin, and Behtash Behin-Aein, Non-volatile spin switch for Boolean and non-Boolean logic, Appl. Phys. Lett. **101**, 252411 (2012).
- [67] Shaloo Rakheja, Sou-Chi Chang, and Azad Naeemi, Impact of dimensional scaling and size effects on spin transport in copper and aluminum interconnects, IEEE. Trans. Electron Devices 60, 3913 (2013).
- [68] Nickvash Kani and Azad Naeemi, Analytical models for coupling reliability in identical two-magnet systems during slow reversals, J. Appl. Phys. **122**, 223902 (2017).
- [69] Nickvash Kani, Shaloo Rakheja, and Azad Naeemi, Analytic modeling of dipolar field requirements for robust coupling in a non-identical biaxial two-magnet system, J. Appl. Phys. 124, 023901 (2018).
- [70] Nikhil Rangarajan, Arun Parthasarathy, Nickvash Kani, and Shaloo Rakheja, Energy-efficient computing with probabilistic magnetic bits—Performance modeling and comparison against probabilistic CMOS logic, IEEE. Trans. Magn. 53, 1 (2017).
- [71] Yang Xie and Ankur Srivastava, Mitigating sat attack on logic locking, in International Conference on Cryptographic Hardware and Embedded Systems (Springer, 2016), pp. 127–146.
- [72] Huanyu Wang, Domenic Forte, Mark M. Tehranipoor, and Qihang Shi, Probing attacks on integrated circuits: Challenges and research opportunities, IEEE Des. Test 34, 63 (2017).