# A 130 nm 1.2 V/3.3 V 16 Kb Spin-Transfer Torque Random Access Memory With Nondestructive Self-Reference Sensing Scheme

Yiran Chen, Member, IEEE, Hai Li, Member, IEEE, Xiaobin Wang, Member, IEEE, Wenzhong Zhu, Wei Xu, and Tong Zhang, Senior Member, IEEE

Abstract—Among all the emerging memories, Spin-Transfer Torque Random Access Memory (STT-RAM) has demonstrated many promising features such as fast access speed, nonvolatility, excellent scalability, and compatibility to CMOS process. However, the large process variations of both magnetic tunneling junction (MTJ) and MOS transistors in the scaled technologies severely limit the yield of STT-RAM chips. In this work, we proposed a new sensing scheme, named as nondestructive self-reference sensing, or NSRS, for STT-RAM. By leveraging the different dependencies of the high and low resistance states of MTJs on the cell current amplitude, the proposed NSRS technique can work well at the scenario when bit-to-bit variation of MTJ resistances is large. Furthermore, we proposed three combined magnetic- and circuit-level techniques, including R-I curve skewing, yield-driven cell current selection, and ratio matching, to further improve the sense margin and robustness of NSRS sensing scheme. The measurement results of a 16 Kb STT-RAM test chip show that our proposed nondestructive self-reference sensing technique can reliably readout all the measured memory bits, of which 10% read failure rate was observed by using the conventional sensing technique. The three enhancement technologies ensure a 20 mV minimum sense margin and the whole sensing process can complete within 15 ns.

*Index Terms*—Spin-torque, STT-RAM, MRAM, self-reference sensing scheme.

#### I. INTRODUCTION

T HE migration trend of microprocessors to multi-core architecture generates the explosive demands of on-chip embedded memory [1]. However, the conventional memory technologies, e.g., SRAM, DRAM, Flash memory etc., face significant challenges as technology node keeps scaling down [22], [23], [27]. In the recent several years, Spin-Transfer

Manuscript received May 17, 2011; revised September 19, 2011; accepted September 19, 2011. Date of publication November 07, 2011; date of current version January 27, 2012. This paper was approved by Associate Editor Peter Gillingham.

Y. Chen is with the Electrical and Computer Engineering Department, University of Pittsburgh, Pittsburgh, PA 15261 USA.

H. Li is with the Department of Electrical and Computer Engineering, Polytechnic Institute of New York University, Brooklyn, NY 11201 USA (e-mail: hli@poly.edu).

X. Wang and W. Zhu are with Seagate Technology, Bloomington, MN 55435 USA.

W. Xu and T. Zhang are with the Electrical, Computer and Systems Engineering Department, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2011.2170778

Torque Random Access Memory (STT-RAM), which is the improved descendant of magnetic random access memory (MRAM), attracted the increased attentions for its unique characteristics such as non-volatility, simple cell structure, fast read/write speed (<10 ns), high endurance, high array density, and excellent CMOS-compatibility and scalability [6], [7], [24], [25].

In an MRAM bit, the data is stored as the two or more different resistance states of a magnetic tunneling junction (MTJ) device. The conventional MRAM technology uses a current-induced magnetic field to flip the magnetization of an MTJ and therefore change its resistance state [3], [4]. Although some commercial products based on the conventional MRAM technology have become available [5], the amplitude of the required switching magnetic field increases rapidly as MTJ size shrinks. Consequently, the scalability of the conventional MRAM technology is severely limited by the high switching current of the MRAM bits and the incurred disturbance to the adjacent bits.

The invention of STT-RAM makes it possible for MRAM technology to continue scaling down. Different from the conventional MRAM technology, the magnetization of the MTJ in an STT-RAM bit is flipped by a spin-polarized current, which means that its switching mechanism is constrained locally. Moreover, the scalability of STT-RAM technology is guaranteed by the fact that the magnitude of switching current is proportional to the MTJ cell area: when the MTJ size decreases, the required switching current reduces accordingly [6], [7]. In the past several years, many STT-RAM designs have been demonstrated [6], [7], [24], [25].

Although the physical mechanism, the manufacturing and the integration technologies of STT-RAM have been extensively investigated, the process variation control remains as a major gating factor preventing STT-RAM technology from the mass production. Due to the quantum mechanical tunneling, the resistance of an MTJ has an exponential relation to the thickness of oxide barrier between the two magnetic layers (more details on MTJ structure will be given in Section II-A) [9], [10]. Tehrani *et al.* [9] reported that MTJ resistance increases by 8% when the thickness of oxide barrier changes by ~1% at oxide thickness of around 0.1 nm. Moreover, the variation of MTJ resistance will be aggravated by the further reduction of oxide barrier thickness and the large MTJ geometry variation in the scaled technologies.

In the conventional read scheme, the data of an STT-RAM bit is read out by comparing the MTJ resistance to a reference



Fig. 1. MTJ structure. (a) Anti-parallel (high resistance state). (b) Parallel (low resistance state). (c) 1T1J STT-RAM cell structure. (d) 1T2J STT-RAM cell structure.

value. When MTJ resistance variation is large, the two resistance states of some MTJs may be all higher or lower than the reference value. As a result, these memory bits are constantly detected as 1's or 0's. Based on the fact that the high-resistance state of an MTJ is always higher than its own low-resistance state under any conditions, some self-reference read schemes were proposed by comparing the original value in an MTJ to a reference value written into the same MTJ [8], [11]. Besides the high power consumption and the long access latency incurred by two write steps (overwriting the original value with a reference value and writing the readout value back to the MTJ), such a *destructive* scheme also raises chip reliability concerns from the viewpoints of non-volatility and endurance cycles.

In this paper, we propose a *nondestructive self-reference* sensing scheme (NSRS) to overcome the bit-to-bit variations in STT-RAM design. Although NSRS still includes two sense steps as the conventional self-reference sensing scheme (CSRS) does, two costly write steps are eliminated. NSRS significantly improves the memory reliability, reduces the read latency and the power consumption. In addition, we propose three combined magnetic- and circuit-level techniques to enhance the sense margin and device variation tolerance of NSRS, including (1) R-I curve skewing technique to safely increase the cell current amplitude without degrading the reliability of STT-RAM bits; (2) yield-driven cell current selection flow to further increase the cell current amplitude by scarifying the disturbance probability of STT-RAM bits in a quantitative way; and (3) ratio matching technique to adjust the cell current ratio in post-manufacture stage and overcome the process variation in voltage divider. A 16 Kb STT-RAM test chip with STT-RAM cell size of 4.57  $\mu$ m<sup>2</sup> was fabricated in 130 nm front-end-of-line technology [18] and 65 nm MTJ process. The measurement results show that NSRS can reliably read out all the measured memory bits, of which a 10% read failure rate was observed by using the conventional sensing technique. A 20 mV minimum sense margin of NSRS is achieved after applying the three enhancement technologies.

The rest of this paper is organized as follows: Section II gives the basics of STT-RAM and summarizes its sensing schemes; Section III describes the design concept of the proposed nondestructive self-reference scheme; Section IV illustrates how the three proposed techniques improve the robustness of NSRS; Section V presents the circuit design details of our 16 Kb test



Fig. 2. The measured R-I sweep curve of an MgO-based MTJ.

chip and the measurement results; finally, Section VI concludes our work.

#### II. PRELIMINARY

#### A. Basics of MTJ and STT-RAM Cell

A magnetic tunneling junction (MTJ) includes two ferromagnetic layers and one oxide barrier layer, e.g., MgO. As shown in Fig. 1(a) and (b), when the magnetization directions of the two ferromagnetic layers are anti-parallel or parallel, MTJ is in high or low resistance state, respectively. In STT-RAM designs, the magnetization direction of one ferromagnetic layer (reference layer) is fixed while the magnetization direction of the other ferromagnetic layer (free layer) can be changed by passing a switching current polarized by the magnetization of reference layer [6].

Fig. 2 illustrates the measured R-I sweep curve of an MgObased MTJ with a 90 nm  $\times$  180 nm ellipse shape. The applied voltage pulse width is 40 ns. The missing points from the 40 ns pulse measurement are extrapolated from the DC (static) measurement. When applying a positive voltage on point B in Fig. 1(a), MTJ enters the positive voltage region in Fig. 2 and switches from high resistance state to low resistance state. On the contrary, when applying a positive voltage on point A, MTJ enters the negative voltage region and switches in the opposite way.

RD FN

MT

Cell

Transistor

RD EN2

Decoder

WL

BL

SL

Let  $R_{\rm L}$  and  $R_{\rm H}$  denote the low and the high MTJ resistances, respectively. We define the tunneling magneto resistance ratio as TMR =  $(R_{\rm H} - R_{\rm L})/R_{\rm L}$ . As shown in Fig. 2,  $R_{\rm H}$ ,  $R_{\rm L}$  and TMR depend on the cell current. In general, a larger TMR makes it easier to distinguish the two resistance states of an MTJ. MgObased MTJs are widely used in present-day STT-RAM design because of its relatively higher TMR (>100%) than other materials, e.g., AlO (<30%).

One-transistor-one-MTJ (or 1T1J) in Fig. 1(c) [6], [7] and two-transistor-one-MTJ (2T1J) in Fig. 1(d) [25], [28], [29] are two popular STT-RAM cell designs. In this work, we use 1T1J structure to demonstrate the effectiveness of the proposed NSRS. The same design concept can be used to 2T1J STT-RAM design too.

As shown in Fig. 1(c), in a 1T1J STT-RAM cell, one MTJ is connected to one NMOS transistor in series. The interconnects connected to MTJ, to the source/drain and the gate of the NMOS transistor are called as bit-line (BL), source-line (SL) and word-line (WL), respectively. The MTJ is usually modeled as a variable resistor in the circuit schematic [12].

#### B. Conventional Voltage Sensing Scheme (CVSS)

Fig. 3 illustrates a conventional voltage sensing scheme used in STT-RAM design, where a read current  $I_{\rm R}$  is applied to generate the BL voltage [13]:

$$V_{\rm BL,L} = I_{\rm R} \cdot (R_{\rm L} + R_{\rm TR})$$
 when MTJ is at low resistance state, or

$$V_{\rm BL,H} = I_{\rm R} \cdot (R_{\rm H} + R_{\rm TR})$$
  
when MTJ is at high resistance state, (1)

here  $V_{\rm BL,L}$  and  $V_{\rm BL,H}$  are the BL voltages when the MTJ is at low and high resistance states, respectively.  $R_{\rm TR}$  represents the resistance of NMOS transistor. By comparing the BL voltage to a reference voltage  $V_{\rm REF}$  between  $V_{\rm BL,L}$  and  $V_{\rm BL,H}$ , the MTJ resistance state can be readout. When a  $V_{\rm REF}$  is shared by multiple STT-RAM cells, it needs to satisfy:

$$Max(V_{BL,L}) < V_{REF} < Min(V_{BL,H}).$$
(2)

Here  $Max(V_{BL,L})$  and  $Min(V_{BL,H})$  denote the maximal  $V_{BL,L}$ and the minimal  $V_{BL,H}$  generated by all the involved STT-RAM



Output

M<sub>S2</sub>|- SLT2

C2

V<sub>BL2</sub>

Msi

SLT1

cells, respectively. Unfortunately,  $Max(V_{BL,L}) < Min(V_{BL,H})$ may not remain true when the bit-to-bit variation of MTJ resistance is large.

# C. Conventional Self-Reference Scheme (CSRS)

To overcome the bit-to-bit variation of MTJ resistances, a so called "self-reference" sensing scheme shown in Fig. 4 was proposed [11]. The operation of the conventional self-reference scheme (CSRS) can be summarized as:

*First read:* A read current  $I_{\rm R1}$  is applied to generate BL voltage  $V_{\rm BL1}$ , which is stored in a capacitor C1.  $V_{\rm BL1}$  can be either  $V_{\rm BL,L1}$  or  $V_{\rm BL,H1}$ , which are the BL voltages when the MTJ is at the low resistance state or the high resistance state, respectively;

*Erase:* Data "0" is written into the same STT-RAM memory cell;

Second read: Another read current  $I_{R2}(>I_{R1})$  is applied to generate BL voltage  $V_{BL2}$ , which is stored in capacitor C2. Here  $I_{R2}$  is carefully chosen to make sure:

$$V_{\rm BL,L1} < V_{\rm BL2} < V_{\rm BL,H1}.$$
 (3)

The original value of STT-RAM bit can be readout by comparing  $V_{\rm BL1}$  and  $V_{\rm BL2}$ .

*Write back:* Write the readout data in the previous step back to the STT-RAM bit.

CSRS requires two write operations. When power supply fluctuations occur during the sensing process, the stored data in the STT-RAM bit may be lost. Moreover, the introduction of the write operations significantly degrades the endurance lifetime of the STT-RAM cells.

# III. NONDESTRUCTIVE SELF-REFERENCE SCHEME (NSRS)

# A. Design Concept

We noticed that the dependencies of the high and the low resistance states of an MTJ on the cell current are quite different: the current roll-off slope of the high resistance state is





Fig. 5. Schematic of nondestructive self-reference circuitry.



Fig. 6. Design of nondestructive self-reference scheme. (a) Design concept. (b) Real design.

much steeper than that of the low resistance state, as shown in Fig. 2. This special characteristic of MgO-based MTJ's is the motivation of the proposed nondestructive self-reference scheme (NSRS).

Fig. 5 illustrates the conceptual schematic of NSRS. A switch transistor  $(M_{S1})$  is connected to BL as well as the corresponding voltage storage element C1. The other switch transistor  $(M_{S2})$  is connected to a voltage divider. Control signal SLT1 and SLT2 are used to turn on and turn off the two switch transistors. Two inputs of a voltage sense amplifier are connected to the top connect point of C1 and the output of the voltage divider  $(V_{BL2O})$ , respectively. The voltage ratio of the voltage divider  $\alpha = V_{BL2O}/V_{BL2}$ .

Fig. 6(a) illustrates the NDSR operation:

*First read:* A read current  $I_{R1}$  is applied to generate BL voltage  $V_{BL1}$ , which is stored in C1.  $V_{BL1}$  can be either  $V_{BL,L1}$  or  $V_{BL,H1}$ , which are the BL voltages when the MTJ is at the low resistance state or the high resistance state, respectively;

Second read: Another read current  $I_{\rm R2}$ , which is larger than  $I_{\rm R1}$ , is applied and generates BL voltage  $V_{\rm BL2}$ . We define the read current ratio  $\beta = I_{\rm R2}/I_{\rm R1}$ .

Sensing:  $V_{\rm BL1}$  and  $V_{\rm BL2O}$  are compared by the voltage sense amplifier. If  $V_{\rm BL1}$  is significantly larger than  $V_{\rm BL2O}$ , the original value of STT-RAM bit is "1" (high resistance state). Otherwise, the original value of STT-RAM bit is "0" (low resistance state). The explanation is as follows. If the original value of STT-RAM bit is "1", we have:

$$V_{\rm BL1} = V_{\rm BL,H1} = I_{\rm R1} \cdot (R_{\rm H1} + R_{\rm TR1})$$
  
$$V_{\rm BL2} = V_{\rm BL,H2} = I_{\rm R2} \cdot (R_{\rm H2} + R_{\rm TR2}).$$
(4)

If we set  $\alpha = 1/\beta$ , then

$$V_{\rm BL2O} = V_{\rm BL,H2O} = \alpha \cdot V_{\rm BL2} = \frac{I_{\rm R1}}{I_{\rm R2}} \cdot I_{\rm R2} \cdot (R_{\rm H2} + R_{\rm TR2}) = I_{\rm R1} \cdot (R_{\rm H2} + R_{\rm TR2}).$$
(5)

Here  $R_{\rm H1}$  and  $R_{\rm H2}$  are the resistances of MTJ at high resistance state, under cell current  $I_{\rm R1}$  and  $I_{\rm R2}$ , respectively; and  $V_{\rm BL,H1}$  and  $V_{\rm BL,H2}$  are the corresponding BL voltages, respectively.  $V_{\rm BL,H20}$  is the output of voltage divider when MTJ is at high resistance state, under  $I_{\rm R2}$ .  $R_{\rm TR1}$  and  $R_{\rm TR2}$  are the resistances of the NMOS transistor, under cell current  $I_{\rm R1}$  and  $I_{\rm R2}$ , respectively. Assuming  $R_{\rm TR1} = R_{\rm TR2} = R_{\rm TR}$ , the sense margin for "1" is

$$\Delta V_{\rm BL,H} = V_{\rm BL,H1} - V_{\rm BL,H2O} = I_{\rm R1} \cdot (R_{\rm H1} - R_{\rm H2}) > 0$$
(6)

because  $R_{\rm H1}$  is significantly larger than  $R_{\rm H2}$ .

Similarly, if the original value of STT-RAM bit is "0", we have the sense margin for "0"

$$\Delta V_{\rm BL,L} = V_{\rm BL,L1} - V_{\rm BL,L2O} = I_{\rm R1} \cdot (R_{\rm L1} - R_{\rm L2}) \approx 0.$$
(7)

Here  $R_{L1}$  and  $R_{L2}$  are the resistances of MTJ at low resistance state, under cell current  $I_{R1}$  and  $I_{R2}$ , respectively.  $R_{L1}$  is close to  $R_{L2}$  (see Fig. 2). Here  $V_{BL,L1}$  and  $V_{BL1,L2}$  are the BL voltages when the MTJ resistance equals  $R_{L1}$  or  $R_{L2}$ , respectively.  $V_{BL,L2O}$  is the output of voltage divider when MTJ is at low resistance state, under  $I_{R2}$ .

Our proposed self-reference scheme is *nondestructive* because it does not include any write operations. Therefore, the total read latency and power consumption are dramatically reduced compared to CSRS.

In practice, designing a circuit to detect if two values, e.g.,  $V_{\rm BL,L1}$  and  $V_{\rm BL,L2O}$ , being "equal" is very difficult. Instead, we carefully chose  $I_{\rm R1}$  to ensure  $V_{\rm BL,H1} > V_{\rm BL,H2O} > V_{\rm BL,L2O} > V_{\rm BL,L1}$ . STT-RAM bit is "1" when  $V_{\rm BL1} > V_{\rm BL2O}$ , otherwise it is "0", as shown in Fig. 6(b). We define the largest allowable read current that does not disturb the MTJ state is  $I_{\rm Rmax}$ . To maximize the sense margin, we usually choose  $I_{\rm R2} = I_{\rm Rmax}$ .

The optimal  $\beta$  that ensures the equal sense margins when the MTJ is at both high- and low-resistance states (or  $|V_{\rm BL,H1} - V_{\rm BL,H2O}| = |V_{\rm BL,L2O} - V_{\rm BL,L1}|$ ) can be calculated by solving:

$$\frac{\frac{1}{\alpha \cdot \beta}}{\frac{R_{\text{H2}} + R_{\text{L2}} + 2R_{\text{TR2}}}{R_{\text{H2}} + R_{\text{L2}} + (\Delta R_{\text{Hmax}} + \Delta R_{\text{Lmax}}) \cdot \left(1 - \frac{1}{\beta}\right) + 2R_{\text{TR1}}}}.$$

(8)

Here  $\Delta R_{\text{Hmax}}$  (or  $\Delta R_{\text{Lmax}}$ ) denotes the resistance difference when an MTJ at the high (or low) resistance state is read at a close-to-zero read current and  $I_{\text{Rmax}}$ , as shown in Fig. 6. Our later analysis in Section IV-C shall show that  $\alpha \cdot \beta$  varies in the range from 1 to 1.3 based on the MTJ devices shown in Fig. 2. The exact value is determined by many design factors, such as the transistor sizing, the selection of the two read currents, etc.

#### B. Robustness Analysis of NSRS

Three main factors can significantly affect the effectiveness of NSRS: the variation of read current ratio  $\beta$ , the shift of the NMOS transistor resistance  $R_{\rm TR}$  under  $I_{\rm R1}$  and  $I_{\rm R2}$ , and the variation of voltage ratio  $\alpha$ . The robustness of NSRS can be measured by the allowed variation range of the above three factors for a nonzero sense margin of a STT-RAM bit.

1) Robustness Analysis of Read Current Ratio  $\beta$ : The selection of read currents in NSRS needs to ensure:

$$\alpha I_{\rm R2} \cdot (R_{\rm H2} + R_{\rm TR2}) < I_{\rm R1} \cdot (R_{\rm H1} + R_{\rm TR1}), \qquad (9a)$$

$$I_{\rm R2} \cdot (R_{\rm L2} + R_{\rm TR2}) < I_{\rm R2} \cdot (R_{\rm H2} + R_{\rm TR2}),$$
 (9b)

$$I_{\rm R1} \cdot (R_{\rm L1} + R_{\rm TR1}) < \alpha I_{\rm R2} \cdot (R_{\rm L2} + R_{\rm TR2}).$$
 (9c)

Equation (9b) always keeps true. Since the NMOS transistor works at linear region, approximately  $R_{\text{TR1}} = R_{\text{TR2}} = R_{\text{TR}}$ . Then a solution of  $\beta$  can be found as long as:

$$1 + \frac{\Delta R_{\rm L}}{R_{\rm L2} + R_{\rm TR}} < \alpha \cdot \beta < 1 + \frac{\Delta R_{\rm H}}{R_{\rm H2} + R_{\rm TR}}.$$
 (10)

Here  $\Delta R_{\rm L} = R_{\rm L1} - R_{\rm L2} = \Delta R_{\rm Lmax} \cdot (1 - (1/\beta))$  and  $\Delta R_{\rm H} = R_{\rm H1} - R_{\rm H2} = \Delta R_{\rm Hmax} \cdot (1 - (1/\beta))$ . A valid selection of  $\beta$  exists only if:

$$\frac{\Delta R_{\rm Lmax}}{R_{\rm L2} + R_{\rm TR}} < \frac{\Delta R_{\rm Hmax}}{R_{\rm H2} + R_{\rm TR}}.$$
(11)

Equation (11) is always true for a normal MgO-based MTJ. For multiple STT-RAM bits, a valid  $\beta$  exists only if:

$$\operatorname{Max}\left(\frac{\Delta R_{\mathrm{Lmax}}}{R_{\mathrm{L2}} + R_{\mathrm{TR}}}\right) < \operatorname{Min}\left(\frac{\Delta R_{\mathrm{Hmax}}}{R_{\mathrm{H2}} + R_{\mathrm{TR}}}\right).$$
(12)

Normally the left side of (12) is close to zero since  $\Delta R_{\rm Lmax} \approx 0$ . Increasing the maximum read current  $I_{\rm R2}$  can effectively increase the right side of (12) by reducing  $R_{\rm H2}$  and increasing  $\Delta R_{\rm Hmax}$  simultaneously. In our design,  $(\Delta R_{\rm Hmax})/(R_{\rm H2} + R_{\rm TR})$  is within the range from 0.1 to 0.25.

2) Robustness Analysis of NMOS Transistor Resistance  $\Delta R_{TR}$ : Although the NMOS transistor in a 1T1J STT-RAM cell works at the linear region, its equivalent resistance shifts under the different read currents. In other words,  $\Delta R_{TR} = R_{TR1} - R_{TR2} > 0$ . Based on (9a)–(9c),  $\Delta R_{TR}$  in NSRS need satisfy:

$$(\alpha\beta - 1) \cdot (R_{\mathrm{H}2} + R_{\mathrm{TR}2}) - \Delta R_{\mathrm{H}}$$
  
$$< \Delta R_{\mathrm{TR}} < (\alpha\beta - 1) \cdot (R_{\mathrm{L}2} + R_{\mathrm{TR}2}) - \Delta R_{\mathrm{L}}.$$
(13)

3) Robustness Analysis of Voltage Ratio  $\alpha$ : Process variation also results in the deviation of the voltage ratio  $\alpha$  of voltage divider away from the designed value in NSRS. We assume  $\Delta$ is the deviation of voltage ratio from the designed value and  $R_{\text{TR1}} = R_{\text{TR2}} = R_{\text{TR}}$ . Replacing  $\alpha$  in (9a)–(9c) by  $\alpha(1 + \Delta)$ , we have the range of  $\Delta$  for the correct read operation as:

$$\frac{1}{\alpha\beta} \cdot \left(\frac{R_{\rm L1} + R_{\rm TR}}{R_{\rm L2} + R_{\rm TR}}\right) - 1 < \Delta < \frac{1}{\alpha\beta} \cdot \left(\frac{R_{\rm H1} + R_{\rm TR}}{R_{\rm H2} + R_{\rm TR}}\right) - 1.$$
(14)

# IV. ENHANCEMENT TECHNOLOGIES FOR NONDESTRUCTIVE SELF-REFERENCE SCHEME

To further enhance the robustness of NSRS, we propose three new magnetic- and circuit-level techniques by improving the sense margin and device variation tolerance. The design details of these techniques are explained in this section.

# A. R-I Curve Skewing

In NSRS, a large  $I_{R2}$ , i.e., the maximum allowable cell current, is desired. NSRS is based on the difference between the resistance changes of an MTJ at high and low resistance states, when the cell current varies from  $I_{R1}$  to  $I_{R2}$ . The larger  $I_{R2}$  is, the larger such a resistance difference is. Moreover, a large  $I_{R2}$ (as well as  $I_{R1}$ ) can generate a high bit-line voltage. However, a large  $I_{R2}$  may disturb the state of MTJ by stochastically flipping the magnetization direction of free layer. In theory, the disturbance probability (Pr<sub>sw</sub>) of an MTJ at a cell current of  $I_R$  can be expressed as [14]:

$$\Pr_{\rm sw} = 1 - \exp\left\{-\frac{t}{\tau}\exp\left[-\Delta\left(1 - \frac{I_{\rm R}}{I_{\rm C}}\right)\right]\right\}.$$
 (15)

Here, t is the cell current pulse width.  $\Delta$  is the magnetic memorizing energy without applying any current or magnetic field.  $\tau$  is the inverse of the attempt frequency;  $I_{\rm C}$  is the critical switching current, which is the minimum current amplitude to switch MTJ resistance with a write pulse width of  $\tau$ . Usually  $\Pr_{\rm rsw}$  is a fixed parameter in memory design specification.

Based on (15), when  $Pr_{sw}$  is fixed, a larger  $I_C$  implies a larger maximum allowable cell current  $I_{R2}$ . When the write pulse width is longer than 10 ns, the relationship between  $I_C$  and the write pulse width t can be expressed by a theoretical equation as [6]

$$I_{\rm C} = I_{\rm C0} \left[ 1 - \left(\frac{kT}{E}\right) \ln\left(\frac{\tau}{\tau_0}\right) \right]. \tag{16}$$

Here  $I_{C0}$  is the critical switching current at 0 K;  $\tau_0$  is the inverse of the attempt frequency, or the write pulse width at 0 K; k is the Boltzmann constant; and T is the operating temperature. E is the magnetization stability energy barrier height. When E increases, MTJ becomes more stable, and hence a higher  $I_C$  is required to flip the magnetization of MTJ.

When an external magnetic field exists, magnetization stability energy barrier height E can be calculated as [15]

$$E = K_u V \left[ 1 + \left(\frac{H}{H_k}\right)^2 \right]$$
(17a)

when the longitudinal component of the magnetic field is the same as the magnetization direction of the free layer, or

$$E = K_u V \left[ 1 - \left(\frac{H}{H_k}\right)^2 \right]$$
(17b)

when the longitudinal component of the magnetic field is opposite to the magnetization direction of the free layer. Here  $K_u$  is the uniaxial anisotropy energy; V is the volume of the free layer; H is the longitudinal component of the external field;  $H_k$  is the anisotropy field.

Based on the above analysis, we proposed *R-I* curve skewing technique to improve  $I_{R2}$  without affecting the write performance and the reliability of STT-RAM: in read operations, we apply a proper external magnetic field to raise the magnetization stability energy barrier height of MTJs. Therefore, the critical switching current and the maximum allowable cell current are temporarily increased.

Fig. 7 depicts the concept of R-I curve skewing. We assume the cell current always passes through the MTJ from the free layer to the reference layer. The magnetic direction of the external field is opposite to that of the reference layer. When the original data is "1", the magnetization stability energy barrier height E is increased as (17a). The maximum allowable cell current increases accordingly. When the original data stored in the MTJ is "0", the magnetization stability energy barrier height E is reduced as (17b). However, the cell current in read operations will not change the magnetization direction of the free layer at such a direction.

The measured R-I sweep curves of the same MTJ under the external magnetic fields with a longitudinal component of 47 oe and 68 oe are shown in Fig. 2. Here the magnetic direction of the external field is opposite to that of the reference layer. As expected, the critical switching current  $I_{\rm C}$  increases as the magnitude of applied magnetic field increases. When an external field of 68 oe is applied,  $I_{\rm C}$  increases up to ~750  $\mu$ A.

Fig. 8 shows one possible STT-RAM cell design with R-I curve skewing technique. A metal wire is put on the top of the memory cell. During a read operation, a current is injected into the metal wire and generates the required magnetic field. Assume the distance between the MTJ device and the metal wire is 0.25  $\mu$ m, which is the typical distance between Metal 3 and Metal 4 layers in 130 nm CMOS technology [18], the amplitude of current to generate a magnetic field of 68 oe is less than 8 mA with magnetic cladding [16]. This generated magnetic field can be shared by multiple memory bits in one read operation. Note that the required magnitude of the longitudinal magnetic field to flip the magnetization direction of free layer is above 300 oe [17], which is much higher than that of the applied magnetic field in our R-I curve skewing technique. Hence, the probability of the external magnetic field to disturb other STT-RAM bits is negligible.

# B. Yield-Driven Cell Current Selection

Equation (15) shows that  $Pr_{sw}$  and  $I_R/I_C$  have an exponential relationship. When cell current is small, i.e.,  $I_{R1}(\sim 0.5 \cdot I_{R2})$ , the incurred disturbance of MTJs can be ignore. In NSRS,  $I_{R2}$ dominates the MTJ disturbance probability. Fig. 9 shows the simulated  $Pr_{sw}$  of the same MTJ in Fig. 2 as the cell current  $I_{R2}$ varies. The cell current pulse width is set to 7 ns. The magnetic parameters of a typical MTJ used in our simulation are shown in Table I.



(b)

Fig. 7. R-I curve skewing technique. (a) Concept. (b) R-I curve at different cases.



Fig. 8. Conceptual design of R-I curve skewing technique.



Fig. 9. Disturbance probabilities ( $Pr_{sw}$ ) of a 90 nm  $\times$  180 nm MTJ at various cell currents and the corresponding disturbance-clean probability of a 256 Kb STT-RAM array. Cell current pulse width is 7 ns.

TABLE I MAGNETIC PARAMETERS OF A TYPICAL MTJ

| t  | Δ        | τ  | I <sub>C</sub> | <i>I</i> <sub>C</sub> (680e) |
|----|----------|----|----------------|------------------------------|
| 7n | $40^{*}$ | 40 | 500μΑ          | 750µA                        |

\* represents a 4-year data-retention time.

\*\* represents a switching current density of  $3.9 \times 10^{6}$  A/cm<sup>2</sup>.

We define the disturbance clean probability  $(Pr_{no\_dis})$  as the probability of an STT-RAM array in which no bit is disturbed by the cell current  $I_{R2}$ . It can be expressed as:

$$\Pr_{\text{no\_dis}} = (1 - \Pr_{\text{sw}})^M.$$
(18)

Here M is the total number of memory bits in the array. The  $Pr_{no\_dis}$ 's at various cell currents for a 256 Kb STT-RAM array is also shown in Fig. 9. For example, when cell current is 200  $\mu$ A, the corresponding  $Pr_{no\_dis}$  is about 99.99%.

Usually  $\Pr_{no\_dis}$  is predetermined based on the desired memory yield. Therefore, we proposed a yield-driven cell current selection flow as the follows:

**Step 1**: Based on the chip yield requirement and memory size, a  $Pr_{no\_dis}$  is determined, e.g., 99.99% for a 256 Kb STT-RAM array. The cell current pulse width *t* is also decided in this step, e.g., 7 ns, based on the memory architecture and read circuitry design.

Step 2: The maximum allowable  $Pr_{sw}$  of STT-RAM bits for the determined  $Pr_{no\_dis}$  is calculated by (18).



Fig. 10. Selection of  $\alpha \cdot \beta$  under various current ratio  $\beta$ .

Step 3: The maximum allowable cell current  $I_{\rm R2}$  of STT-RAM bits for the maximum allowable  $\Pr_{\rm sw}$  is calculated by (15). Here, some techniques to increase the critical switching current  $I_{\rm C}$ , i.e., R-I curve skewing technique, may be applied simultaneously. To take into account the bit-to-bit variations of the critical switching current in STT-RAM, the lower bound of the  $I_{\rm C}$  distribution of all STT-RAM bits may be adopted.

By slightly relaxing  $Pr_{no_dis}$ , the maximum allowable cell current  $I_{R2}$  may increases substantially. For example, Fig. 9 shows that without applying R-I curve skewing,  $I_{R2}$  increases from 200  $\mu$ A to 265  $\mu$ A when  $Pr_{no_dis}$  of a 256 Kb STT-RAM array is relaxed from 99.99% to 99%,. When R-I curve skewing is applied (68 oe field), the corresponding  $I_{R2}$  can further increase to 300  $\mu$ A or 400  $\mu$ A when  $Pr_{no_dis}$  is 99% or 99.99%, respectively. The degradation of chip yield could be fixed by ECC (error correction code) scheme. Note that 99% and 99.99% error rates are not the final bit error rate target of STT-RAM chip design. Here we use them to demonstrate the effectiveness of R-I curve skewing method that is verified by the measurement results from 16 Kb testing chip.

#### C. Ratio Matching

1) Mismatch Between Cell Current Ratio  $\beta$  and Voltage Ratio  $\alpha$ : The allowable  $\alpha$  and  $\beta$  values are constrained by (10). Based on the measured MTJ R-I data in Fig. 2, we calculated the selection region of  $\alpha \cdot \beta$  under various current ratios  $\beta$ . The result is shown in Fig. 10. Here, LB\_xxx $\mu$ A and UB\_xxx $\mu$ A (xxx = 200, 300, and 400) denote the lower-bound and the upper-bound of the valid  $\alpha \cdot \beta$  selection when  $I_{R2} = xxx\mu$ A, respectively. Due to the fluctuations in MTJ R-I data measurement, several glitches exist on the curves in Fig. 10, The following observations were made from Fig. 10:

- 1) The lower-bound of  $\alpha \cdot \beta = (1 + (R_{L1} R_{L2})/(R_{L2} + R_{TR}))$  is insensitive to the variations of  $\beta$  because the changes of the low resistance of MTJ is very small at different cell currents;
- When β increases, the scale of α · β selection region (lower-bound, upper-bound) increases and approaches a saturated value. It is because the upper-bound = (1 + (R<sub>H1</sub> R<sub>H2</sub>)/(R<sub>H2</sub> + R<sub>TR</sub>)) becomes bigger as β increases;

3) Similar as 2), when  $I_{\rm R2}$  increases, the scale of  $\alpha \cdot \beta$  selection region increases too.

Fig. 10 indicates that a large  $\beta$  is desired in NSRS design for a wide  $\alpha \cdot \beta$  selection region. One way to obtain a large  $\beta$  is decreasing the read current  $I_{R1}$ . To accommodate the reasonable read speed, we limited the minimal allowable  $I_{R1}$  as 20  $\mu$ A in the design. Accordingly, only the data satisfying this limitation are demonstrated.

2) Selection of Voltage Ratio  $\alpha$ : As shown in Fig. 7(a),  $\Delta V_{\rm BL,H}$ —the difference between  $V_{\rm BL,H1}$  and  $V_{\rm BL,H2O}$ , is an important parameter to measure the robustness of NSRS.  $\Delta V_{\rm BL,H}$  can be calculated by:

$$\Delta V_{\text{BL,H}} = V_{\text{BL,H1}} - V_{\text{BL,H2O}}$$

$$= \frac{1}{\beta} I_{\text{R2}} \cdot (R_{\text{H1}} + R_{\text{TR}}) - \alpha I_{\text{R2}} \cdot (R_{\text{H2}} + R_{\text{TR}})$$

$$= \alpha I_{\text{R2}} \cdot \Delta R_{\text{Hmax}} (1 - \alpha)$$
(19)

when assuming  $\alpha \cdot \beta = 1$  in the conceptual design of NSRS. When  $\alpha = 0.5, \Delta V_{\rm BL,H}$  achieves its maximum value  $0.25 \cdot I_{\rm R2} \cdot \Delta R_{\rm Hmax}$ .

Fig. 11 shows the calculated  $\Delta V_{\rm BL,H}$ 's under various  $\alpha$  and  $I_{\rm R2}$  based on the measured MTJ R-I data in Fig. 2.  $\beta$  is set as  $1/\alpha$ . Again, the fluctuations in MTJ R-I curve testing lead to the glitches on the  $\Delta V_{\rm BL,H}$  curves. As expected, the maximum  $\Delta V_{\rm BL,H}$ 's are always achieved when  $\alpha \approx 0.5$ . In fact,  $\alpha = 0.5$  (a symmetric structure of voltage divider) is also helpful to minimize the impact of process variations.

3) Ratio Matching Technique: In NSRS circuitry, the voltage ratio  $\alpha$  of the voltage dividers is usually fixed, i.e., as the resistance ratio of two resistors in series. The deviation of  $\alpha$  from the designed value could easily exceed 5% [18], which affects the selection region of  $\alpha \cdot \beta$  constrained by (10). We proposed ratio matching technique to tune the value of  $\alpha \cdot \beta$  into the allowable region in the post-manufacture stage.

Fig. 12 shows the conceptual schematic of ratio matching technique. One set of NSRS circuitry is composed of a sense amplifier, capacitor (C1), voltage divider (RU and RD), and a current mirror current source. Every set of NSRS circuitry can be shared by multiple STT-RAM columns that is determined by column decoder signal ColDec. Within a column, the STT-RAM cell that needs to be read out is selected by row decoder signal RowDec. Signals SLT1 and SLT2 controls the timing of two sense steps associated with  $I_{R1}$  and  $I_{R2}$ , respectively. Two reference currents of the current mirror are used to generate  $I_{R1}$  and  $I_{R2} - I_{R1}$ , respectively.

The magnitude of the reference currents of the current mirror are controlled by two programmable decoders: Decoder 1 and Decoder 2. When an output bit of a decoder is low, the corresponding reference current component is removed from the total reference current. By doing so, the cell current ratio  $\beta$  can be adjusted to compensate the variation of voltage ratio  $\alpha$  and make sure the value of  $\alpha \cdot \beta$  be within the allowable region.

The optimal configuration of Decoder 1 and Decoder 2 can be determined in the chip testing stage and stored in nonvolatile memory or devices, e.g., fuse array. If the nonvolatile memory or devices are reprogrammable, e.g., STT-RAM-based latches



Fig. 11. Sense margin under various  $\alpha$ .



Fig. 12. Conceptual schematic of ratio matching technique.

[19], such testing can be executed multiple times to compensate the shifting of  $\alpha$  and  $\alpha \cdot \beta$  selection region due to circuit aging.

#### V. TEST CHIP DETAILS AND EXPERIMENTAL RESULTS

# A. 16 KB Test Chip Design

To demonstrate the effectiveness of NSRS, we fabricated a 16 Kb 1T1J STT-RAM test chip in 130 nm front-end-of-line technology [18] and 65 nm in-house MTJ process. Fig. 13 shows the design diagram. A 16 Kb STT-RAM array is constructed with 128 columns and 128 rows. 8 IO blocks, each of which includes a conventional sense circuitry and a NSRS sense circuitry with ratio matching function, are designed to support an 8-bit read/write bandwidth. A central timing control logic block and an analog control logic block are designed to generate the timing and reference signals to the corresponding circuit blocks. An external clock is sent into the global clock generation block to produce all required on-chip clock signals.

Fig. 14 shows the conceptual write circuitry of one memory column which was used in the test chip design. The column selection is controlled by ColDec signal. And the polarity of WE1 and WE1B determines the current flow through MTJ device and

hence the data ('1' or '0') to be written into the cell. For example, when writing '0', WE1 = 0 and WE1B = 1, so that a write current is supplied from BL to SL.

For testing purpose, WL drivers, BL drivers, SL drivers and the NMOS transistors in STT-RAM cells are implemented with 3.3 V MOS devices in order to provide up to 2 mA write current to the MTJs. The test chip design also supports direct access mode, in which both WL and BL voltages can be separately adjusted via optional PADs. Thus, we are able to supply current to a single STT-RAM and collect the read out data from the sensing circuit.

The selection of resistors RU and RD mainly concerns the leakage current in read operations. And the selection of capacitance is constrained by  $V_{\rm BL}$  charge time which targeted at 7 ns in our design. All these circuits have been implemented in the test chip. The common layout techniques were used to minimize the impacts of device variations. Since the peripheral circuits are shared by multiple columns, the area overhead is small.

Other circuitries, i.e., sense circuitry and control logic, are implemented with 1.2 V MOS devices. An auto-zero sense-amplifier with a built-in data latch [26] is adopted in both conventional and NSRS sense circuitries to minimize the influence of



Fig. 13. Design diagram of 16 Kb test chip.



Fig. 14. Conceptual schematic of write circuitry.

device mismatch on the read operation. The designed voltage ratio  $\alpha$  of NSRS sense circuitry is 0.5. The control signals of the reference current (see Fig. 12) in NSRS sense circuitry can be programmed by external signals. The magnitude of external magnetic field generated by a current control magnet is monitored by a gauss meter.

Fig. 15 shows the simulation results of NSRS when  $I_{R2} = 200 \ \mu\text{A}$ , 300  $\mu\text{A}$ , and 400  $\mu\text{A}$ , based on the MTJ R-I curve in Fig. 2. The corresponding  $I_{R1}$ 's are 96.8  $\mu\text{A}$ , 147.6  $\mu\text{A}$ , and 191.2  $\mu\text{A}$ , respectively. The leakage current generated by the other unselected memory cells on the same column was considered in the simulation. When  $I_{R2}$  increases from 200  $\mu\text{A}$  to 400  $\mu\text{A}$ , the minimum sense margin of NSRS (including both reading "1" and "0") raises from 12 mV to 49 mV. The whole sense operation can be completed within 15 ns.

The memory cell size of our 16 Kb 1T1J STT-RAM is 4.57  $\mu$ m<sup>2</sup>. The die size including pad ring is 6.22 mm<sup>2</sup>. The

size of 16 Kb STT-RAM itself is 0.54 mm<sup>2</sup>. The memory array size is 0.089 mm<sup>2</sup>. Fig. 16 shows the die photo of our test chip.

#### B. Chip Testing Results

Our electrical test results show that the process variation of 130 nm CMOS technology [18] is under well control: the mean  $(\mu)$  and the standard deviation (Std., or  $\sigma$ ) of the NMOS transistor resistance in 1T1J STT-RAM cells are 917  $\Omega$  and 15  $\Omega$ , respectively. However, a large variation of MTJ resistances was observed: when the cell current is set as 20  $\mu$ A, the high resistances of the measured MTJs vary from 2240  $\Omega$  to 7680  $\Omega$ , and the low resistances of the measured MTJs range from 970  $\Omega$  to 3460  $\Omega$ . More statistic parameters of the MTJ resistances at different cell currents are summarized in Table II. Interestingly, following the increase in  $I_{\rm R2}$ , the distribution of  $R_{\rm H2}$  and  $R_{\rm L2}$  become tighter. This fact indeed helps improve the effective-ness of NSRS. The raw bit error rate of the 16 Kb test chip is more than 98.3%, which includes all the process defects, such as stuck-at-1's and stuck-at-0's.

Reliable NSRS operations are demonstrated by our 16 Kb STT-RAM test chip. Fig. 17 shows the measured sense margins of NSRS with optimized cell current ratio. Without applying external magnetic field, the maximum allowable cell current  $I_{\rm R2}$  is 200  $\mu$ A. The minimum sense margin was measured is about 8 mV among all measured bits. If we apply an external magnetic field of 68 oe, the minimum sense margin is significantly improved to more than 15 mV with  $I_{\rm R2} = 300 \ \mu$ A, or even 20 mV with  $I_{\rm R2} = 400 \ \mu$ A. Based on the calculation in Section IV-B, the disturbance clean probability  $Pr_{\rm no\_dis}$  of such a 16 Kb STT-RAM array reduces from ~100% to 99.91% when  $I_{\rm R2}$  changes from 200  $\mu$ A to 400  $\mu$ A. However, we did not observe any visible changes of bit disturbance statistics due to the increase of cell current in our testing.



Fig. 15. Simulation result of NSRS.



Fig. 16. 16 Kb STT-RAM die photo.

TABLE II Statistic Parameters of MTJ Resistance

|   | <i>I</i> <sub>R2</sub> (μA) | $R_{ m H2}$   |                  |       | R <sub>L2</sub> |                  |       |
|---|-----------------------------|---------------|------------------|-------|-----------------|------------------|-------|
|   |                             | $\mu(\Omega)$ | $\sigma(\Omega)$ | σ/ μ  | $\mu(\Omega)$   | $\sigma(\Omega)$ | σ/ μ  |
|   | 200µA                       | 2510          | 644              | 0.256 | 1448            | 491              | 0.339 |
| ľ | 300µA                       | 2301          | 540              | 0.234 | 1425            | 456              | 0.320 |
| Ī | 400μΑ                       | 2094          | 437              | 0.209 | 1402            | 422              | 0.301 |



Fig. 17. Sense margining comparison. NSRS successfully senses all memory bits.

of the conventional sensing scheme is set to 50  $\mu$ A. Please note that increasing the cell current of conventional sensing scheme

In conventional sensing scheme, around 20% or 10% of memory bits failed to be readout at 8 mV or 20 mV sense margin, respectively. In fact, there are about 10% of memory bits cannot be sensed no matter how we optimized our testing configurations, i.e., the reference voltage. Here the cell current

| Designs | Bit energy per | Read     |         |
|---------|----------------|----------|---------|
| Designs | Read '1'       | Read '0' | latency |
| CVSS    | 0.907 pJ       | 0.88 pJ  | 2.5 ns  |
| CSRS    | 22.1 pJ        | 22 pJ    | 40 ns   |
| NSRS    | 1.16 pJ        | 0.92 pJ  | 15 ns   |

TABLE III COMPARISON OF ENERGY AND LATENCY FOR DIFFERENT SCHEMES

# may even hurt the sensing reliability because of the reduced difference between the $R_H$ and $R_L$ of MTJs.

The peak write power of the test chip is 6.2 mW. The peak read powers of conventional sensing scheme and NSRS are 2.5 mW and 5.9 mW ( $I_{\rm R2} = 400 \ \mu A$ ), respectively. Here we did not take into account the power consumed by the magnetic field generation.

### C. Comparison and Discussion

We also compared the different sensing schemes in terms of performance and energy consumption. The simulation results are shown in Table III. Though NSRS has a more complex hardware implementation, the actual silicon overhead on the overall system is small if considering that each sensing circuitry is shared by multiple columns.

Due to the small read current and short read period, the read energy of CVSS is very small. Comparably, the read energy of CSRS is much higher, which mainly suffers from the two write steps—erase and write back original value. The read time period of NSRS is slightly longer than that in CVSS because capacitor charging requires certain amount of time. As a result, the read energy of NSRS is higher.

The read latency can significantly impact the system performance. CVSS has the shortest read latency among three schemes since it requires only one sensing operation. However, the poor yield constrains its applications. CSRS suffers from two long write steps, and hence, the read latency is much longer than the others. Comparably, NSRS can obtain a better trade-off among performance, energy and yield.

Besides the process variations of CMOS technology, the process variations of MTJ devices also play an important role in the effectiveness of all the read methodologies in STT-RAM designs. For example, the  $\Delta R_{Hmax}$  variation with temperature fluctuation [20] could significantly affect the self-reference schemes, including both CVSS and the proposed NSRS methodology. The readers can refer [21] for more details.

# VI. CONCLUSION

In this work, we proposed a novel self-reference read scheme of STT-RAM called "Nondestructive Self-reference Sensing Scheme (NSRS)" to overcome the large bit-to-bit variation of MTJ resistances. Our scheme does not require any write operations, and therefore, maintains the non-volatility of STT-RAM design and the short read latency. Based on the physics of the magnetic devices, we also proposed three techniques to further improve the robustness of NSRS. Theoretical analysis shows that by applying "yield-driven cell current selection" and "R-I curve skewing" techniques, the cell current of STT-RAM is significantly increased with minimized penalty on the memory reliability. Moreover, the "ratio matching" technique can compensate the process variation in the voltage divider by adjusting the cell current in post-manufacture stage. The measurements of our 16 Kb test chip proved the effectiveness of the proposed techniques.

#### REFERENCES

- K. Olukotun, B. Nayfeh, L. Hammond, K. Wilson, and K. Chang, "The case for a single-chip multiprocessor," in *Proc. 7th Int. Symp. Architectural Support for Programming Languages and Operating Systems*, Oct. 1996, pp. 2–11.
- [2] K. Kim and G. Jeong, "Memory technologies for sub-40 nm node," in Int. Electron Devices Meeting Dig., Dec. 2007, pp. 27–30.
- [3] M. Motoyoshi, I. Yamamura, W. Ohtsuka, M. Shouji, H. Yamagishi, M. Nakamura, H. Yamada, K. Tai, T. Kikutani, T. Sagara, K. Moriyama, H. Mori, C. Fukumoto, M. Watanabe, H. Hachino, H. Kano, K. Bessho, H. Narisawa, M. Hosomi, and N. Okazaki, "A study for 0.18 μm high-density MRAM," in *VLSI Symp. Dig.*, Jun. 2004, pp. 22–23.
- [4] Y. K. Ha, J. E. Lee, H.-J. Kim, J. S. Bae, S. C. Oh, K. T. Nam, S. O. Park, N. I. Lee, H. K. Kang, U.-I. Chung, and J. T. Moon, "MRAM with novel shaped cell using synthetic anti-ferromagnetic free layer," in *VLSI Symp. Dig.*, Jun. 2004, pp. 24–25.
- [5] M. Durlam, D. Addie, J. Akerman, B. Butcher, P. Brown, J. Chan, M. Deherrera, B. N. Engel, B. Feil, G. Grynkewich, J. Janesky, M. Johnson, K. Kyler, J. Molla, J. Martin, K. Nagel, J. Ren, N. D. Rizzo, T. Rodriguez, L. Savtchenko, J. Salter, J. M. Slaughter, K. Smith, J. J. Sun, M. Lien, K. Papworth, P. Shah, W. Qin, R. Williams, L. Wise, and S. Tehrani, "A 0.18 μm 4 Mb toggling MRAM," in *Int. Electron Device Meeting Dig.*, Dec. 2003, pp. 34.6.1–34.6.3.
- [6] M. Hosomi, H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada, M. Shoji, H. Hachino, C. Fukumoto, H. Nagao, and H. Kano, "A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM," in *Int. Electron Device Meeting Dig.*, Dec. 2005, pp. 473–476.
- [7] T. Kawahara, R. Takemura, K. Miura, J. Hayakawa, S. Ikeda, Y. Lee, R. Sasaki, Y. Goto, K. Ito, I. Meguro, F. Matsukura, H. Takahashi, H. Matsuoka, and H. Ohno, "2 Mb spin-transfer torque RAM (SPRAM) with bit-by-bit bidirectional current write and parallelizing-direction current read," in *IEEE Int. Solid-State Circuits Conf. Dig.*, Feb. 2007, pp. 480–617.
- [8] H. Tanizaki, T. Tsuji, J. Otani, Y. Yamaguchi, Y. Murai, H. Furuta, S. Ueno, T. Oishi, M. Hayashikoshi, and H. Hidaka, "A high-density and high-speed 1T-4MTJ MRAM with voltage offset self-reference sensing scheme," in *Proc. Asian Solid-State Circuits Conf.*, Nov. 2006, pp. 303–306.
- [9] S. Tehrani, B. Engel, J. M. Slaughter, E. Chen, M. DeHerrera, M. Durlam, P. Naji, R. Whig, J. Janesky, and J. Calder, "Recent developments in magnetic tunnel junction MRAM," *IEEE Trans. Magn.*, vol. 36, pp. 2752–2757, Sept. 2000.
- [10] J. Li, C. Augustine, S. S. Salahuddin, and K. Roy, "Modeling of failure probability and statistical design of spin-torque transfer magnetic random access memory (STT MRAM) array for yield enhancement," in *Proc. Design Automation Conf.*, Jun. 2008, pp. 278–283.
- [11] G. Jeong, W. Cho, S. Ahn, H. Jeong, G. Koh, Y. Hwang, and K. Kim, "A 0.24-µm 2.0-V 1T1MTJ 16-kb nonvolatile magnetoresistance RAM with self-reference sensing scheme," *IEEE J. Solid-State Circuits*, vol. 38, no. 11, pp. 1906–1910, Nov. 2003.
- [12] Y. Chen, X. Wang, H. Li, H. Liu, and D. V. Dimitrov, "Design margin exploration of spin-torque transfer RAM (SPRAM)," in *Int. Symp. Quality Electronic Design*, Mar. 2008, pp. 684–690.

- [13] H. Li and Y. Chen, "An overview of nonvolatile memory technology and the implication for tools and architectures," in *Design, Automation* & Test in Europe Conf. and Exhibit, Apr. 2009, pp. 731–736.
- [14] Y. Higo, K. Yamane, K. Ohba, H. Narisawa, K. Bessho, M. Hosomi, and H. Kano, "Thermal activation effect on spin transfer switching in magnetic tunnel junctions," *Appl. Phys. Lett.*, vol. 87, p. 082502, 2005.
- [15] R. H. Koch, J. A. Katine, and J. Z. Sun, "Time-resolved reversal of spin-transfer switching in a nanomagnet," *Phys. Rev. Lett.*, vol. 92, p. 088302, 2004.
- [16] MRAM Technical Guide, Freescale Semiconductor.
- [17] J. Slaughter, "MRAM technology: Status and future challenges," in Cornell CNS Nanotechnology Symp., May 2004.
- [18] TSMC 130 nm CMOS Process Design Manual. [Online] Available: [Online]. Available: www.mosis.com
- [19] S. Paul, S. Mukhopadhyay, and S. Bhunia, "Hybrid CMOS-STTRAM non-volatile FPGA: Design challenges and optimization approaches," in *IEEE/ACM Int. Conf. Computer-Aided Design*, 2008, pp. 589–592.
- [20] K. Ono, T. Kawahara, R. Takemura, K. Miura, H. Yamamoto, M. Yamanouchi, J. Hayakawa, K. Ito, H. Takahashi, S. Ikeda, H. Hasegawa, H. Matsuoka, and H. Ohno, "Disturbance-free read scheme and a compact stochastic-spin-dynamics-based MTJ circuit model for Gb-scale SPRAM," in 2009 IEEE Int. Electron Devices Meeting (IEDM) Dig., Dec. 7–9, 2009, pp. 1–4.
- [21] Z. Sun, H. Li, Y. Chen, and X. Wang, "Variation tolerant sensing scheme of spin-transfer torque memory for yield improvement," in *Proc. Int. Conf. Computer-Aided Design (ICCAD)*, Nov. 2010, pp. 432–437.
- [22] K.-P. Chang, H.-T. Lue, Y.-H. Hsiao, K.-Y. Hsieh, and C.-Y. Lu, "Effect of junction engineering for 38 nm BE-SONOS charge-trapping," in 2011 Int. Symp. VLSI Technology, Systems and Applications (VLSI-TSA), Apr. 2011, pp. 1–2.
- [23] A. Spessot, A. Calderoni, P. Fantini, A. S. Spinelli, C. M. Compagnoni, F. Farina, A. L. Lacaita, and A. Marmiroli, "Variability effects on the VT distribution of nanoscale NAND flash memories," in *Proc. 2010 IEEE Int. Reliability Physics Symp. (IRPS)*, May 2010, pp. 970–974.
- [24] S. C. Oh, J. H. J. H., W. C. Lim, W. J. Kim, Y. H. Kim, H. J. Shin, J. E. Lee, Y. G. Shin, S. Choi, and C. Chung, "On-axis scheme and novel MTJ structure for sub-30 nm Gb density STT-MRAM," in 2010 IEEE Int. Electron Devices Meeting (IEDM) Dig., Dec. 2010, pp. 12.6.1–12.6.4.
- [25] D. Halupka, S. Huda, W. Song, A. Sheikholeslami, K. Tsunoda, C. Yoshida, and M. Aoki, "Negative-resistance read and write schemes for STT-MRAM in 0.13 μm CMOS," in 2010 IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig., Feb. 2010, pp. 256–257.
- [26] M.-K. Choi, B.-G. Jeon, N. Jang, B.-J. Min, Y.-J. Song, S.-Y. Lee, H.-H. Kim, D.-J. Jung, H.-J. Joo, and K. Kim, "A 0.25 μm 3.0 V 1T1C 32 Mb nonvolatile ferroelectric RAM with address transition detector (ATD) and current forcing latch sense amplifier (CFLSA) scheme," in 2002 IEEE Int. Solid-State Circuits Conf. Dig., 2002, pp. 162–457.
- [27] International Technology Roadmap for Semiconductors (ITRS), 2010 [Online]. Available: http://www.itrs.net/links/2010itrs/home2010.htm
- [28] M. Aoki, H. Iwasa, and Y. Sato, "A novel voltage sensing 1T/2MTJ cell with resistance ratio for highly stable and scalable MRAM," in 2005 Symp. VLSI Circuits Dig., Jun. 2005, pp. 170–171.
- [29] N. Sakimura, T. Sugibayashi, T. Honda, H. Honjo, S. Saito, T. Suzuki, N. Ishiwata, and S. Tahara, "MRAM cell technology for over 500-MHz SoC," *IEEE J. Solid-State Circuits*, vol. 42, no. 4, pp. 830–838, Apr. 2007.

technologies, low-power circuit design and computer architecture, emerging memory technologies and nano-scale reconfigurable computing system and sensor system. He has published about 80 technical publications in refereed journals and conferences and 2 book chapters, has 37 granted U.S. patents and the other 28 pending applications, and 1 Seagate Trade Secret. His book *Nonvolatile Memory Design: Magnetic, Resistive, and Phase Changing*, will be published in 2011 by CRC Press.

Dr. Chen has been the Organization Committee, Technical Program Committee Member, Track Chair, Session Chair of many international conferences. He has been the reviewers for numerous journals and conferences. He received "The hot 100 products of 2006—PrimeTimeVX" from EDN and the finalist of "Prestigious 2007 DesignVision Awards" from International Engineering Consortium (IEC). He also received the "PrimeTimeVX—EDN 100 Hot Products Distinction" from Synopsys Inc. He received one best paper award and two best paper nominations from ISQED 2008, 2010 and 2005, respectively, one best paper award from ISLPED 2010, one best paper candidate from DATE 2010, and one best paper nomination from ASP-DAC 2011. His invention of Spintronic Memristor was interviewed and reported by IEEE Spectrum in March 2009.



Hai Li (M'08) received the B.S. and M.S. degrees in microelectronics from Tsinghua University, Beijing, China, and the Ph.D. degree from the Electrical and Computer Engineering Department at Purdue University, West Lafayette, IN, in 2004.

She is currently an Assistant Professor with the Department of Electrical and Computer Engineering, Polytechnic Institute of New York University, Brooklyn. She was with Qualcomm, Intel, and Seagate earlier. Her research interests include architecture/circuit/device co-optimization

for green VLSI systems, emerging embedded memory design, 3D integration technology and design, design for new devices, and neuromorphic architecture for brain-inspired electronic systems. She has published 50+ technical papers in refereed journals and conferences, filed 51 U.S. patents (29 granted) and one Seagate Trade Secret. She also authored one book chapter and will publish one book about emerging memory design by CRC Press.

Dr. Li received two best paper awards and three best paper nominations from ISQED, ISLPED, DATE, and ASPDAC. Dr. Li has served as technical program committee of more than 20 international conference series. She has been the reviewer for numerous journals and conferences.



Xiaobin Wang received the Ph.D. degree in physics from the University of California at San Diego (UCSD) in 2003. At UCSD, his study was focused on dynamic thermal magnetization switching and magnetization noise in data storage devices.

He has been with Seagate Technology since 2003. His work at Seagate Technology include advanced transducer modeling, design and testing, prediction of storage and memory system performance through bottom up (from physics to system performance) and top down (from system performance to component

requirements) approaches, alternative technology investigation and intellectual property analysis.

Dr. Wang published over 50 papers and had over 30 patents granted and pending. He was invited to contribute at various conferences and journal reviews.



**Yiran Chen** (M'05) received the B.S. and M.S. degrees in electrical engineering from Tsinghua University, Beijing, China, and the Ph.D. degree in electrical and computer engineering from Purdue University, West Lafayette, IN, in 2005.

He is currently an Assistant Professor with Electrical and Computer Engineering Department, University of Pittsburgh, Pittsburgh, PA. Previously, Dr. Chen worked with Synopsys Inc. and Seagate Technology. His research interests include VLSI design/CAD for nano-scale Silicon and non-Silicon



**Wenzhong Zhu** received the Ph.D. degree in electrical engineer from the University of Minnesota, Minneapolis, in 2004.

He is currently a Senior Staff Engineer in the Twin City Operation at Seagate Technology, Bloomington, MN, where he is working on hard drive recording subsystems. Dr. Zhu has published over 30 papers, and has over 25 patents granted and pending and one Seagate Trade Secret.



Wei Xu received the B.S. and M.S. degrees from Fudan University, Shanghai, China, in 2003 and 2006, respectively, and the Ph.D. degree in electrical engineering from Rensselaer Polytechnic Institute, Troy, NY, in 2009.

He joined Marvell Technology, Santa Clara, CA, in 2009 as a Senior Design Engineer. His research interests include circuit and architecture design for memory and data storage systems, including phasechange memory, spin-transfer torque magnetoresistive memory, solid-state drive and hard disk drive.



**Tong Zhang** (M'02–SM'08) received the B.S. and M.S. degrees in electrical engineering from the Xian Jiaotong University, Xian, China, in 1995 and 1998, respectively. He received the Ph.D. degree in electrical engineering from the University of Minnesota, Minneapolis, in 2002.

He has been with Rensselaer Polytechnic Institute, Troy, NY, since 2002, and he is now an Associate Professor in the Electrical, Computer and Systems Engineering Department. His current research interests include circuits and systems for data storage, signal

processing, and computing.

Currently Dr. Zhang serves as an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II and the IEEE TRANSACTIONS ON SIGNAL PROCESSING.