# **A Study of Multitrack Joint 2-D Signal Detection Performance and Implementation Cost for Shingled Magnetic Recording**

Ning Zheng<sup>1</sup>, Kalyana Sundaram Venkataraman<sup>2</sup>, Aleksandar Kavcic<sup>3,4</sup>, and Tong Zhang<sup>1</sup>

<sup>1</sup>Department of Electrical and Computer Systems Engineering, Rensselaer Polytechnic Institute, NY 12180 USA

<sup>2</sup>Cavium, San Jose, CA 95131 USA

3Department of Electrical Engineering, University of Hawaii, HI 96822 USA

4Institute of Network Coding, Chinese University of Hong Kong, Hong Kong

**Shingled magnetic recording is a promising option to sustain the historical areal density growth of hard disk drives while retaining conventional heads and media. Nevertheless, highly scaled shingled magnetic recording is subject to severe intertrack interference (ITI), fewer grains per channel bit and therefore lower signal-to-noise ratio (SNR). This naturally demands 2-D read channel signal processing, which has an inherently large spectrum of detection performance versus computational complexity tradeoff. By concurrently detecting multitrack readback signals from a read head array, joint 2-D signal detection can fully utilize the 2-D interference to maximize the detection performance at the penalty of the highest computational complexity. Multitrack joint 2-D detection has not been thoroughly studied from either the detection performance or silicon implementation perspective because of the justifiable concern on its practical feasibility. To fill this missing link, this paper presents a comprehensive study of multitrack joint 2-D signal detection performance and silicon implementation cost. We further present an interleaved pipelining strategy to reduce joint 2-D signal detector silicon consumption. By carrying out comprehensive simulations and application-specific integrated circuit (ASIC) design, this paper shows that multitrack joint 2-D signal detection is a practically attractive option with superior detection performance and affordable silicon cost, especially when considering projected CMOS technology scaling toward 16 nm and below.**

*Index Terms***— 2-D equalization, 2-D Viterbi detection, generalized partial response (GPR) target, interleaved pipelining, shingled magnetic recording.**

#### I. INTRODUCTION

AS ONE promising option to continue the historical scal-<br>ing of magnetic recording storage areal density, shingled recording technology has received recent attention [1], [2]. Different from alternative technologies, shingled recording keeps the conventional recording heads and media, and relies on a well-controlled track overlap to increase the storage areal density. With a much tighter track pitch, shingled recording is subject to significant intertrack interference (ITI), fewer grains per channel bit and therefore lower signalto-noise ratio (SNR), which naturally demands 2-D read channel signal processing. Read channel signal processing primarily consists of partial response signal equalization and trellis-based signal detection. Essentially, 2-D read channel signal processing can be realized in a variety of forms, corresponding to different tradeoffs between detection performance and computational complexity. Although the 2-D read channel signal processing has been widely studied (see [3]–[17]), there was not much prior work focusing on the optimal 2-D detection that carries out full 2-D equalization and 2-D detection jointly across all the tracks to fully exploit the 2-D interference for achieving the best detection performance. This is not surprising because such optimal multitrack joint 2-D detection clearly suffers from very high computational complexity, leading to very justifiable concerns on its practical feasibility. Nevertheless, as the CMOS technology continues to scale toward 20 nm and below [18], optimal 2-D detection can become an increasingly feasible and attractive option for

Manuscript received October 16, 2013; revised November 26, 2013 and January 2, 2014; accepted January 9, 2014. Date of publication January 16, 2014; date of current version June 6, 2014. Corresponding author: N. Zheng (e-mail: ningzhengrpi@gmail.com).

Digital Object Identifier 10.1109/TMAG.2014.2300133

future shingled magnetic recording, and hence warrants a thorough study from both detection performance and silicon implementation perspectives.

This paper investigates the signal processing performance and silicon implementation cost for realizing the optimal multitrack joint 2-D detection. It presents the mathematical formulations of 2-D generalized partial response (GPR) equalization and multitrack joint 2-D trellis detection. Because of the pervasive use of low-density parity-check (LDPC) codes with soft-decision decoding in hard disk drives, we are primarily interested in soft-output trellis detection using the soft-output Viterbi algorithm (SOVA). For the purpose of quantitative evaluation, we carry out simulations with severe 2-D interference, and the results clearly show a noticeable gain of multitrack joint 2-D detection over single-track 1-D detection. For its silicon implementation, we first present an interleaved pipelining strategy to maximize the hardware utilization, and then carry out ASIC design for different configurations. The design results show that the silicon cost of optimal multitrack joint 2-D SOVA detectors could be practically affordable, especially when considering the continuous CMOS technology scaling. Therefore, the results of this paper suggest that optimal multitrack joint 2-D SOVA detection, which has been conventionally considered to be impractical due to its high computational complexity, indeed can be a practically viable option and deserves serious consideration in the research community.

## II. MULTITRACK JOINT 2-D SIGNAL DETECTION FOR SHINGLED RECORDING

The Viterbi algorithm has been widely used to realize maximum likelihood signal detection for data storage and communication with interference. In current practice, Viterbi

0018-9464 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Fig. 1. Schematic of read channel with 2-D equalization and 2-D GPR target.

detectors aim to recover only one sequence of bitstream (e.g., bits on the main track in hard disk drives). For systems with significant 2-D interference such as the intersymbol interference (ISI) and ITI in shingled recording, current practice nevertheless does not fully utilize the abundant interference information, leading to suboptimal detection performance. This naturally demands the use of true 2-D signal detection, which jointly detects data on multiple adjacent tracks, to fully utilize the 2-D interference. This is conceptually similar to the multiinput multioutput (MIMO) wireless communication [19], [20], for which true 2-D signal detection is demanded to fully utilize the 2-D interference among multiple transmission/receiving antennas to achieve optimal MIMO signal detection.

This section discusses the realization of 2-D Viterbi detection from the algorithm perspective. The key is to utilize, instead of canceling [9], [21], the ITI. In the following, we first describe the design of 2-D GPR equalization, then introduce the 2-D Viterbi detection process, and finally present simulation results.

#### *A. 2-D Equalization With GPR Target*

This paper employs 2-D equalization with GPR target because of the superior performance of GPR target at high storage areal density [22]. Fig. 1 shows the schematic of a read channel with 2-D equalization and 2-D GPR target, where *j* and *k* indicate the cross-track and down-track bit positions, respectively.

The equalization polynomial  $F(D_1, D_2)$  and target polynomial  $G(D_1, D_2)$  can be written as

$$
F(D_1, D_2) = \sum_{m=-M}^{M} \sum_{n=-N}^{N} f_{m,n} D_1^m D_2^n \qquad (1)
$$

$$
G(D_1, D_2) = \sum_{p=-P}^{P} \sum_{q=-Q}^{Q} g_{p,q} D_1^p D_2^q \qquad (2)
$$

where  $D_1$  and  $D_2$  are unit shifts along the cross-track and down-track directions, respectively. With a matrix form, they can be expressed as

$$
\mathbf{F} = \begin{bmatrix} f_{-M,-N} & f_{-M,-N+1} & \cdots & f_{-M,N} \\ f_{-M+1,-N} & f_{-M+1,-N+1} & \cdots & f_{-M+1,N} \\ \vdots & \vdots & \ddots & \vdots \\ f_{M,-N} & f_{M,-N+1} & \cdots & f_{M,N} \end{bmatrix}
$$

$$
\mathbf{G} = \begin{bmatrix} g_{-P,-Q} & g_{-P,-Q+1} & \cdots & g_{-P,Q} \\ g_{-P+1,-Q} & g_{-P+1,-Q+1} & \cdots & g_{-P+1,Q} \\ \vdots & \vdots & \ddots & \vdots \\ g_{P,-Q} & g_{P,-Q+1} & \cdots & g_{P,Q} \end{bmatrix}.
$$

The equalization output,  $z(j, k)$ , is the 2-D convolution of the equalization coefficient matrix **F** and the channel output  $y(j, k)$ . Meanwhile,  $d(j, k)$  is the 2-D convolution of the target coefficient matrix **G** and the channel input  $a(j, k)$ .

Joint design of 2-D equalizer and equalization target aims to minimize the mean square error (MSE) between  $z(j, k)$  and  $d(j, k)$ . For convenience, we rearrange row-wise the channel input sequence, channel output sequence, equalizer coefficient matrix, and target coefficient matrix in vectors as follows:

$$
\mathbf{a}_{k} = [a_{P,k+Q} \ a_{P,k+Q-1} \ \cdots \ a_{-P,k-(Q-1)} \ a_{-P,k-Q}]^{T}
$$
  
\n
$$
\mathbf{y}_{k} = [y_{M,k+N} \ y_{M,k+N-1} \ \cdots \ y_{-M,k-(N-1)} \ y_{-M,k-N}]^{T}
$$
  
\n
$$
\mathbf{f} = [f_{-M,-N} \ f_{-M,-N+1} \ \cdots \ f_{M,N-1} \ f_{M,N}]^{T}
$$
  
\n
$$
\mathbf{g} = [g_{-P,-Q} \ g_{-P,-Q+1} \ \cdots \ g_{P,Q-1} \ g_{P,Q}]^{T}.
$$

Accordingly, the error signal can be written as

$$
e_k = \mathbf{f}^T \mathbf{y}_k - \mathbf{g}^T \mathbf{a}_k \tag{3}
$$

and the MSE can be obtained as

$$
E\left(e_k^2\right) = \mathbf{f}^T \mathbf{R} \mathbf{f} - 2\mathbf{f}^T \mathbf{T} \mathbf{g} + \mathbf{g}^T \mathbf{A} \mathbf{g}
$$
 (4)

where  $\mathbf{A} = E\left\{ \mathbf{a}_k \mathbf{a}_k^T \right\}$  is the autocorrelation matrix of the where  $A = E\{a_k a_k\}$  is the autocorrelation matrix of the channel input,  $\mathbf{R} = E\{y_k y_k^T\}$  is the autocorrelation matrix of the channel output, and  $\mathbf{T} = E\left\{ \mathbf{y}_k \mathbf{a}_k^T \right\}$  is the cross-correlation matrix of the channel output and input.

To avoid the trivial solution of  $f = 0$ ,  $g = 0$  in minimizing the MSE, we should impose constraints on **g**

$$
\mathbf{E}^T\mathbf{g} = \mathbf{c}
$$

to ensure certain entries of **g** to be specific values. Hence  $E^T$  is a matrix whose number of rows is the number of entries in **g** that do not need to be optimized, and **c** is the corresponding column vector. For example, assume the target coefficient vector **g** is constrained as

$$
\mathbf{g} = [g_{-1,-1} \quad g_{-1,0} \quad g_{-1,1} \quad g_{0,-1} \quad 1 \quad g_{0,1} \quad 0 \quad 0 \quad 0]^T.
$$

Then we can obtain the constraint matrix  $\mathbf{E}^T$  as

$$
\mathbf{E}^T = \begin{bmatrix} 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix}
$$

and the corresponding column vector **c** as

 $\mathbf{c} = \begin{bmatrix} 1 & 0 & 0 & 0 \end{bmatrix}^T$ .

The constraints lead to the following Lagrange function for minimizing the MSE:

$$
J = \mathbf{f}^T \mathbf{R} \mathbf{f} - 2 \mathbf{f}^T \mathbf{T} \mathbf{g} + \mathbf{g}^T \mathbf{A} \mathbf{g} - 2\lambda^T \left( \mathbf{E}^T \mathbf{g} - \mathbf{c} \right)
$$
 (5)

where  $\lambda$  is a column vector containing Lagrange multipliers. Further calculation gives us the optimized target and equalizer coefficient vectors as follows:

$$
\lambda = \left( \mathbf{E}^T \left( \mathbf{A} - \mathbf{T}^T \mathbf{R}^{-1} \mathbf{T} \right)^{-1} \mathbf{E} \right)^{-1} \mathbf{c} \tag{6}
$$

$$
\mathbf{g} = \left(\mathbf{A} - \mathbf{T}^T \mathbf{R}^{-1} \mathbf{T}\right)^{-1} \mathbf{E} \lambda \tag{7}
$$

$$
\mathbf{f} = \mathbf{R}^{-1} \mathbf{T} \mathbf{g}.
$$
 (8)

## *B. Multitrack Joint 2-D Viterbi Detection*

With the 2-D GPR target, we can construct the corresponding 2-D trellis onto which 2-D Viterbi detection (or its soft-output variants) can be applied to simultaneously detect multiple tracks that interfere with each other. For *S* equalized sequences, the corresponding 2-D trellis, given the constraint length  $L$ , consists of  $2^{SL}$  states and each state has  $2^S$  incoming or outgoing branches (thus total  $2^{S(L+1)}$  possible state transitions). Assuming AWGN after equalization, we can calculate the branch metric, or equivalently the joint-likelihood of the equalized data, as

$$
bm_u = \frac{1}{(2\pi)^{S/2} \sigma_1 \cdots \sigma_S} exp\left[-\frac{1}{2} \sum_{i=1}^S \left(\frac{r_i - d_{i,u}}{\sigma_i}\right)^2\right]
$$
  
 
$$
u = 1, 2, ..., 2^{S(L+1)}
$$
 (9)

where  $r_i$  and  $\sigma_i$  are the equalized signal and standard deviation of noise associated with the *i*-th sequence, respectively, and  $d_{i,u}$  is the target value associated with the *i*-th sequence and the *u*-th state transition. By considering log-likelihood and ignoring the constant term, we can calculate an alternative branch metric  $\overline{bm}_u$  as

$$
\overline{bm}_{u} = -\sum_{i=1}^{S} \left( \frac{r_i - d_{i,u}}{\sigma_i} \right)^2, \quad u = 1, 2, ..., 2^{S(L+1)}.
$$
 (10)

Assuming all the  $\sigma_i$ s are equal, this branch metric calculation can be simplified as a summation of *S* Euclidean distances, with which state transition metrics can be calculated. Fig. 2 shows an example of 2-D trellis state transition from time index *k* − 1 to *k* for three-track joint 2-D detection. With regard to 2-D detection, the constraint length *L* (instead of the total number of states) determines the depth of trace back for generating detection output, that is, *L* determines the length of the trellis above which all the survivor paths merge with a very high probability. It should be pointed out that the trellis in this multitrack detection runs only in the down-track direction since the multiple bits in different tracks are simultaneously detected. This is different from 2-D detection which involves state transitions in both the down-track direction and the crosstrack direction.

The pervasive use of LDPC codes in hard disk drives demands soft-output signal detection. Therefore, we are primarily interested in soft-output multitrack 2-D detection using SOVA. For multitrack joint 2-D detection, we first consider the multiple bits being detected using the same trellis step as a nonbinary symbol, and apply the SOVA to obtain the soft-output associated with each symbol, from which we then



Fig. 2. Illustration of the state transition in three-track joint 2-D detection.

extract the soft-output associated with each bit. At the *k*th step during the detection, we maintain a  $(\delta - L) \times M$  reliability measure matrix for each state *sk* as

$$
\hat{\mathbf{L}}(s_k) = \left[\hat{L}_{j,\mu}(s_k)\right], \quad \begin{array}{l} j = k - \delta + 1, \ldots, k - L \\ \mu = 0, \ldots, M - 1 \end{array}
$$

where  $\delta$  is the trace back length which is typically 5*L* to 7*L*, and  $M = 2^S$ .  $\hat{L}_{j,\mu}(s_k)$  is the reliability difference between the survivor at state  $s_k$  and the most likely path terminating in state  $s_k$  with decision  $\mu$  at time *j*. To this end, we record the metric differences between the survivor path and all the other paths arriving at *sk*

$$
\Delta_m = \Gamma(s_{k-1}^m, s_k) - \Gamma(s_k) \ m = 0, 1, ..., M - 1 \quad (11)
$$

where  $\Gamma(s_k)$  is the survivor path metric for  $s_k$  and  $\Gamma(s_{k-1}^m, s_k)$ is the path metric from state *sk*<sup>−</sup><sup>1</sup> with decision *m* at time *k*−*L*. In the update procedure, we first set the reliability values at time  $k - L$  as

$$
\hat{L}_{k-L,\mu}(s_k) = \Delta_{\mu} \ \mu = 0, 1, ..., M - 1 \tag{12}
$$

and then update other values in  $\hat{\mathbf{L}}(s_k)$  as

$$
\hat{L}_{j,\mu}(s_k) = \min_{m \in \{0,1,\dots,M-1\}} \left\{ \hat{L}_{j,\mu}(s_{k-1}^m) + \Delta_m \right\}
$$
\n
$$
j = k - \delta + 1, \dots, k - L - 1, \ \mu = 0, 1, \dots, M - 1.
$$
\n(13)

It has been proven that the above updating rule is equivalent to the Max-Log-MAP algorithm [23]. The final log-likelihood ratio (LLR) of each bit is the difference between the minimum LLR of the symbols with '1' on that bit position and the minimum LLR of the symbols with '0' on that bit position. For example, let us consider a two-track detection, and suppose the symbol LLRs for "00," "01," "10," and "11" are 1.0, 0, 0.8, and 0.6, respectively. Then the LLR for the first bit is  $0.6-0 = 0.6$ , and the LLR for the second bit is  $0 - 0.8 = -0.8$ .

## *C. Simulation Results*

We further carry out simulations to evaluate the performance of multitrack joint 2-D Viterbi detection and its advantages over single-track 1-D detection. In this paper, we use the following 2-D read sensitivity function:

$$
h = \begin{bmatrix} 0.1621 & 0.4026 & 0.1621 \\ 0.4026 & 1 & 0.4026 \\ 0.1621 & 0.4026 & 0.1621 \end{bmatrix}
$$

which implies severe ISI and ITI, and corresponds to a highly scaled shingled recording case to some extent. With readback signal from *S* consecutive tracks, we carry out the joint 2-D



Fig. 3. Illustration of an example where five consecutive tracks are read back and the three inner tracks are simultaneously detected by joint 2-D Viterbi detection.



Fig. 4. Simulated detector output BER under different detection configurations.



Fig. 4 shows the simulated bit error rate (BER) versus SNR of joint 2-D Viterbi detection with different tracks being detected concurrently. For the purpose of comparison, it also shows the simulated results for the case of only detecting one track, i.e., using a 2-D equalizer with  $3 \times 7$  coefficient matrix and 1-D GPR target with  $1 \times 3$  target matrix, which is denoted as 3-T equalization, 1-T detection.

In this paper, SNR is defined as  $20 \log(V_p/\sigma)$ , where  $V_p$  is the peak signal of the readback waveform, and  $\sigma$ is the standard deviation of the AWGN that is added at



Fig. 5. Simulated LDPC decoding FER under different detection configurations.

the output of the channel before equalization. As shown in Fig. 4, compared with the one-track detection, the joint twotrack, three-track, and four-track detection can achieve about 2.7, 4.6, and 5.4 dB gain, respectively, at the BER of  $10^{-2}$ . At the BER of  $4 \times 10^{-3}$ , the SNR gains increase to 3.1, 5.6, and 6.8 dB, respectively. The more the tracks are being jointly detected, the larger the gain will be. This can be intuitively justified since we can better utilize the 2-D interference as we jointly detect more tracks. Of course, the gain will diminish as we continue to increase the number of tracks being jointly detected, which can also be shown in Fig. 4 (e.g., the gain at the BER of  $10^{-2} \sim 10^{-3}$  gradually reduces as we increase the number of tracks).

Fig. 5 shows the simulated frame error rate (FER) when carrying out SOVA detection followed by a length 4 KB, rate-8/9 LDPC code decoder. For joint 2-D detection, the LLR of each bit is obtained using the procedure described in Section II-B. For one-track detection, we use the SOVA detection described in [24], which is also equivalent to its Max-Log-MAP counterpart. As expected, the LDPC code



Fig. 6. Illustration of 2-D signal detection architecture of (a) one of *n* separate detectors and (b) a pipelined detector at less silicon cost.

| Implementations<br>$(L = 2)$      | Throughput<br>(Mbytes/s) | Area for soft information<br>update(mm <sup>2</sup> $@65$ nm) | Total area<br>$\text{(mm}^2\textcircled{a}65\text{nm})$ | Area percentage for soft<br>information update | Scaled to 16nm<br>$\text{(mm}^2)$ |
|-----------------------------------|--------------------------|---------------------------------------------------------------|---------------------------------------------------------|------------------------------------------------|-----------------------------------|
| 1-track detection                 | 111                      | 0.009                                                         | 0.018                                                   | 51                                             | 0.001                             |
| 1-track detection with pipelining | 198                      | 0.011                                                         | 0.025                                                   | 45                                             | 0.002                             |
| 2-track detection                 | 111                      | 0.238                                                         | 0.280                                                   | 85                                             | 0.018                             |
| 2-track detection with pipelining | 211                      | 0.266                                                         | 0.316                                                   | 84                                             | 0.020                             |
| 3-track detection                 | 107                      | 3.695                                                         | 4.056                                                   | 91                                             | 0.254                             |
| 3-track detection with pipelining | 203                      | 3.919                                                         | 4.332                                                   | 90                                             | 0.271                             |
| 4-track detection                 | 107                      | 58.211                                                        | 61.535                                                  | 95                                             | 3.846                             |
| 4-track detection with pipelining | 209                      | 60.005                                                        | 63.594                                                  | 94                                             | 3.975                             |

TABLE I SYNTHESIS RESULTS OF VARIOUS SOVA IMPLEMENTATIONS

decoding performance noticeably improves as we increase the number of tracks that are being jointly detected.

#### III. SILICON IMPLEMENTATION OF JOINT 2-D DETECTOR

Practical implementation of multitrack joint 2-D detection involves a wide spectrum of signal detection performance versus detector implementation cost tradeoff. On one extreme of this tradeoff spectrum, each track is detected independently after 2-D equalization, which incurs the minimal implementation cost at the penalty of significant performance loss; on the other extreme, all the tracks are jointly detected using a single 2-D trellis, by which we can achieve the best detection performance at the penalty of the highest implementation cost. Within this tradeoff spectrum, all the tracks are partitioned into a certain number of sets, within each set all the tracks are jointly detected.

Assume all the tracks are partitioned into *n* track sets, straightforwardly the overall 2-D signal detection can be implemented by *n* separate 2-D Viterbi detectors. Fig. 6(a) shows one individual detector for one track set including *K* tracks to be detected. It is well known that the Viterbi detection clock frequency is limited by the add-compare-select (ACS) latency, which heavily depends on the number of incoming branches at each trellis state. For the 2-D signal detection, each trellis state has a relatively a large number of incoming branches, leading to relatively a long ACS latency. As a result, each individual detector can only run at a relatively low clock frequency. Meanwhile, the parallel ACS units in all the *n* separate detectors occupy a large silicon area. We propose to apply pipelining, one of the most fundamental techniques in VLSI architecture design, to merge the detection of these *n* track sets into the same detector, as shown in Fig. 6(b). The key is to enable the pipelining within the ACS loop by interleaving all the track sets. The pipelined ACS loop can directly increase the achievable clock frequency. This clearly can reduce the silicon cost without sacrificing the overall 2-D detection throughput due to the increase of clock frequency. In addition, the interleaving detection makes it possible to aggregate a multitude of ACS arrays used for the update of soft information, as illustrated in Fig. 6(b). Note that we use the combination of ACS and FIFO here instead of *n*-stage ACS, because the width of soft information can be less than that of path metric.

Table I lists the throughput and area for various SOVA implementations with constraint length  $L = 2$  and trace back length  $\delta = 5L = 10$ . All the designs are synthesized using the Synopsys tool set at 65 nm technology node. Each soft information is represented by 8 bits, and the pipelining is of two stages. The detection throughput (and clock frequency as well) increases significantly with pipelining. The total area grows almost exponentially with the number of tracks, and the soft information update block largely dominates the silicon area of the entire 2-D detector. With the total detector silicon area obtained at 65 nm node, we apply the scaling rule to estimate the area under projected 16 nm node, as listed in the table. The results show that, even for the joint detection of four tracks, the total silicon area may be practically tolerable under the projected 16-nm technology node.

#### IV. CONCLUSION

This paper presents a study of multitrack joint 2-D detection, which can fully exploit the 2-D interference in shingled magnetic recording to optimize the detection performance, from both algorithm and silicon implementation perspectives. We first discuss the mathematical formulations of 2-D GPR equalization and multitrack joint 2-D trellis detection, with the focus on the use of SOVA detection. We then carry out simulations that quantitatively demonstrate the noticeable detection performance advantage of such multitrack joint 2-D detection in comparison with low-complexity single-track 1-D detection. We further evaluate its silicon implementation through ASIC design, where an interleaved pipelining strategy is proposed to minimize the detector implementation silicon cost. The results show that, with projected CMOS technology scaling to 16 nm, such an optimal and computation-intensive 2-D detection approach may not necessarily incur prohibitively large silicon area. By this comprehensive paper, we believe that optimal multitrack joint 2-D detection can be a practically viable option for future shingled magnetic recording disk drives.

## ACKNOWLEDGMENT

This work was supported in part by the National Science Foundation under Grant ECCS-1128148 and in part by the Advanced Storage Technology Consortium.

#### **REFERENCES**

[1] K. Miura, E. Yamamoto, H. Aoi, and H. Muraoka, "Estimation of maximum track density in shingles writing," *IEEE Trans. Magn.*, vol. 45, no. 10, pp. 3722–3725, Oct. 2009.

- [3] L. Barbosa, "Simulataneous detection of readback signals from magnetic recording tracks using array heads," *IEEE Trans. Magn.*, vol. 26, no. 5, pp. 2163–2165, Sep. 1990.
- [4] M. Vea and J. Moura, "Multichannel equalization for high track density magnetic recording," in *Proc. IEEE Int. Conf. Commun.*, May 1994, pp. 1221–1225.
- [5] P. Voois and J. Cioffi, "Multichannel signal processing for multiplehead digital magnetic recording," *IEEE Trans. Magn.*, vol. 30, no. 6, pp. 5100–5114, Nov. 1994.
- [6] E. Kurtas, J. Proakis, and M. Salehi, "Reduced complexity MLSD algorithms for multi-track magnetic recording systems," in *Proc. IEEE Int. Symp. Inf. Theory*, Jul. 1997, p. 137.
- [7] E. Soljanin and C. Georghiades, "Multihead detection for multitrack recording channels," *IEEE Trans. Inf. Theory*, vol. 44, no. 7, pp. 2988–2997, Nov. 1998.
- [8] S. Tosi and T. Conway, "Near maximum likelihood multi-track partial response detection for magnetic recording," in *Proc. IEEE Global Telecommun. Conf. (GLOBECOM)*, vol. 4. Dec. 2004, pp. 2445–2449.
- [9] W. Chang and J. Cruz, "Inter-track interference mitigation for bitpatterned magnetic recording," *IEEE Trans. Magn.*, vol. 46, no. 11, pp. 3899–3908, Nov. 2010.
- [10] E. Hwang, R. Negi, and B. Kumar, "Signal Processing for near 10 Tbit/in2 density in two-dimensional magnetic recording (TDMR)," *IEEE Trans. Magn.*, vol. 46, no. 6, pp. 1813–1816, Jun. 2010.
- [11] M. Elidrissi, K. S. Chan, K. K. Teo, K. Eason, E. Hwang, B. Kumar, *et al.*, "Modeling of 2-D magnetic recording and a comparison of data detection schemes," *IEEE Trans. Magn.*, vol. 47, no. 10, pp. 3685–3690, Oct. 2011.
- [12] L. Pan, W. Ryan, R. Wood, and B. Vasic, "Coding and detection for rectangular-grain TDMR models," *IEEE Trans. Magn.*, vol. 47, no. 6, pp. 1705–1711, Jun. 2011.
- [13] M. Yamashita, Y. Okamoto, Y. Nakamura, H. Osawa, K. Miura, S. J. Greaves, *et al.*, "Modeling of writing process for two-dimensional magnetic recording and performance evaluation of two-dimensional neural network equalizer," *IEEE Trans. Magn.*, vol. 48, no. 11, pp. 4586–4589, Nov. 2012.
- [14] R. M. Todd, E. Jiang, R. L. Galbraith, J. R. Cruz, and R. W. Wood, "Two-dimensional voronoi-Based model and detection for shingled magnetic recording," *IEEE Trans. Magn.*, vol. 48, no. 11, pp. 4594–4597, Nov. 2012.
- [15] S. Khatami and B. Vasic, "Detection for two-dimensional magnetic recording systems," in *Proc. ICNC*, 2013, pp. 535–539.
- [16] Y. Wu, J. A. O'Sullivan, N. Singla, and R. S. Indeck, "Iterative detection and decoding for separable two-dimensional intersymbol interference," *IEEE Trans. Magn.*, vol. 39, no. 4, pp. 2115–2120, Jul. 2003.
- [17] S. Karakulak, P. H. Siegel, J. K. Wolf, and H. N. Bertram, "Jointtrack equalization and detection for bit patterned media recording," *IEEE Trans. Magn.*, vol. 46, no. 9, pp. 3639–3647, Sep. 2010.
- [18] K. Kim, "Future silicon technology," in *Proc. ESSDERC*, 2012, pp. 1–6.
- [19] B. M. Hochwald and S. Ten Brink, "Achieving near-capacity on a multiple-antenna channel," *IEEE Trans. Commun.*, vol. 51, no. 3, pp. 389–399, Mar. 2003.
- [20] A. J. Paulraj, D. A. Gore, R. U. Nabar, and H. Bolcskei, "An overview of MIMO communications—A Key to gigabit wireless," *Proc. IEEE*, vol. 92, no. 2, pp. 198–218, Feb. 2004.
- [21] S. Nabavi, B. V. K. V. Kumar, and J.-G. Zhu, "Modifying Viterbi algorithm to mitigate intertrack interference in bit-patterned media," *IEEE Trans. Magn.*, vol. 43, no. 6, pp. 2274–2276, Jun. 2007.
- [22] P. Kovintavewat, I. Ozgunes, E. Kurtas, J. R. Barry, and S. W. McLaughlin, "Generalized partial-response targets for perpendicular recording with jitter noise," *IEEE Trans. Magn.*, vol. 38, no. 5, pp. 2340–2342, Sep. 2002.
- [23] C. Ling, X. Wu, and X. Yi, "On SOVA for nonbinary codes," *IEEE Commun. Lett.*, vol. 3, no. 12, pp. 335–337, Dec. 1999.
- [24] M. P. C. Fossorier, F. Burkert, S. Lin, and J. Hagenauer, "On the equivalence between SOVA and Max-Log-MAP decodings," *IEEE Commun. Lett.*, vol. 2, no. 5, pp. 137–139, May 1998.