

Designs of three representative types of 3-D systems are described, evaluated and reviewed to illustrate the potential benefits and risks of these systems.

By Tong Zhang, Senior Member IEEE, Rino Micheloni, Senior Member IEEE, Guoyan Zhang, Zhaoran Rena Huang, and James Jian-Qiang Lu, Senior Member IEEE

ABSTRACT | Three-dimensional (3-D) integration of systems by vertically stacking and interconnecting multiple materials, technologies, and functional components offers a wide range of benefits, including speed, bandwidth and density increase, power reduction, small form factor, packaging reduction, yield and reliability increase, flexible heterogeneous integration with multifunctionality, and overall cost reduction. A new spectrum of opportunities and challenges arises for integrated system designers, which warrants rethinking and innovations from system design perspectives. By selecting three representative cases, i.e., solid-state data storage, power delivery, and hybrid radio-frequency/optical transceiver for distributed sensor networks, this paper intends to exemplify the potentials of exploiting the benefits of 3-D integration technology from system perspectives.

KEYWORDS | Flash memory; hybrid communication; integrated circuits; on-chip power delivery; 3-D integration

T. Zhang, G. Zhang, Z. R. Huang, and J. J.-Q. Lu are with the Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY 12181 USA (e-mail: tzhang@ecse.rpi.edu; Gloria.zhanggy@gmail.com;

huangz3@rpi.edu; luj@rpi.edu).

INVITED PAPER

R. Micheloni is with Qimonda Flash GmbH, Munich, Germany

(e-mail: rino.micheloni@qimonda.com).

Digital Object Identifier: 10.1109/JPROC.2008.2007478

# I. INTRODUCTION

There have been significant developments in threedimensional (3-D) integration technology towards high manufacturability, which is expected to lead to an industry paradigm shift [1], [2]. The most compelling potential benefits of 3-D integration include multifunctionality, increased performance, reduced power, small form factor, reduced packaging, increased yield and reliability, flexible heterogeneous integration, and reduced overall costs. As a result, 3-D integration has been widely considered as an enabling technology for future integrated circuits (ICs) and low-cost micro/nano/electroopto/bioheterogeneous integrated systems [2].

The family of 3-D integration technology includes packaging-based 3-D integration such as system-inpackage (SiP) and package-on-package (PoP), die-to-die and die-to-wafer 3-D integration, and wafer-level back end of the line (BEOL)-compatible 3-D integration. In general, all 3-D integration technologies would offer high-density component integration with small form factor (small size and light weight), reduced packaging, and reduced power (fewer I/Os to be driven). Detailed discussions on the 3-D technologies have been presented in many publications, including the papers in this Special Issue.

The potential benefits and risks of 3-D integration can only be explored to the full extent when it is integrally

## 0018-9219/\$25.00 ©2009 IEEE

#### Vol. 97, No. 1, January 2009 | PROCEEDINGS OF THE IEEE 161

Manuscript received July 26, 2007; revised May 26, 2008. Current version published February 27, 2009.

considered from the very beginning of the overall integrated system design. This certainly provides a new spectrum of opportunities and challenges for system designers and warrants significant rethinking and innovations from system design perspectives. This paper attempts to exemplify the potential benefits and risks of 3-D integration technologies from system perspectives through three representative cases, including solid-state data storage, power delivery, and hybrid radio-frequency (RF)/optical transceiver for distributed sensor networks.

The first case study focuses on NAND Flash memory system using 3-D integration. Driven by the ever exploding demands for higher capacity nonvolatile solid-state data storage in the emerging pervasive computing and communication era, NAND Flash memory has become one of the fastest growing segments in global semiconductor industry. Besides the technology scaling, the storage density of NAND flash memory can be greatly improved by using multilevel storage technique that stores more than 1 bit in each memory cell. With 3-D integration technology, NAND Flash memory die stack vertically and share common peripheral functions, leading to a dramatic storage density increase. Furthermore, 3-D integration makes it feasible to use very powerful and complex peripheral function design techniques, in particular onchip fault tolerance in multilevel flash memory. We will show that this will enable the most aggressive use of multilevel storage technique to push the envelope of storage density at minimal cost.

The second one focuses on 3-D power delivery for highly integrated systems, such as high-performance microprocessors, application-specific integrated circuits (ASICs), and 3-D integrated systems. Conventional power delivery methods for microprocessors and high-performance ASICs have fundamental limitations in meeting the power requirements of future IC technologies. A 3-D power delivery approach was proposed as a possible solution to the problems of conventional two-dimensional (2-D) power delivery [3], [4]. In this paper, we will address the key aspects of 3-D power delivery system. Based on a review of relevant challenges with conventional power delivery, a 3-D power delivery scheme is introduced with a list of advantages. The unique capabilities offered by 3-D integration are highlighted. Design methodologies of 3-D power delivery are discussed with design guidelines provided and potential issues addressed.

The third case study focuses on an integrated on-chip hybrid RF/optical wireless transmitter through 3-D SiP to create a sensor node for distributed sensor networks. The analog, digital, memory, and sensing chips are fabricated and optimized individually for the best combination of performance and cost. All the chips are in bare die package and are hosted in a Si substrate through vertically chip bonding. The on-chip RF antennas are fabricated on the host Si substrate and the optical elements integrated to the host substrate by thin-film wafer bonding. The 3-D vertical integration scheme allows the antenna and optical elements to be combined on a low-cost Si platform while keeping a short reach to their driver circuits. The 3-D integrated sensor node achieves an extremely compact package while expanding its functions from sensing to data storage, signal processing, and network wireless connection.

# II. 3-D INTEGRATED NAND FLASH MEMORY SYSTEMS

This section addresses the potential NAND Flash memory system design innovations enabled by 3-D integration technologies. First, the NAND Flash memory architecture in current practice is reviewed. As illustrated in Fig. 1, a NAND Flash memory typically consists of two planes of cell array, and the peripheral circuits mainly contain row decoders, charge bumps, and control logic (including page buffer). Each cell array is partitioned into blocks, and each block is further partitioned into pages. The read/ programming and erase are carried out per page and per block, respectively. The typical page size ranges from 512 byte to 4 Kbyte user data. Furthermore, the multilevel



Fig. 1. Example floorplan schematic of NAND Flash memories and a rough area breakdown among different parts.



(ML) concept, i.e., to store more than 1 bit in each memory cell, has been widely used as an effective means to increase NAND Flash memory storage capacity [5]. Most present ML NAND Flash memories store 2 bits in each cell. Due to the inherently reduced operational margin, ML NAND Flash memories typically employ error-correcting code (ECC) to ensure storage reliability, where relatively weak ECC with up to 5-bit error-correcting capability is currently used [6]. Readers are referred to recently reported NAND Flash memories [7]–[10] for further information and [11] for detailed discussions on Flash memory circuit design and implementation.

In the context of 3-D integrated NAND Flash memory systems, we are particularly interested in the integration technologies that offer high interdie interconnect densities such as die-on-wafer or wafer-level integration. This enables the integrated Flash die to share some circuits and function blocks, which provides designers an unprecedented design space to pursue innovative memory system design solutions. In this section, we attempt to demonstrate the potential by presenting two possible new design approaches enabled by 3-D integration.

#### A. Peripheral Circuits Sharing

The first approach of leveraging 3-D integration is relatively straightforward. As illustrated in Fig. 1, Flash memory peripheral circuits occupy a nonnegligible portion of a Flash die. Hence, for 3-D integrated Flash memories, a certain number of Flash die can share some peripheral circuits. Since through-silicon vias (TSVs) are required to realize circuits sharing among different die, the silicon cost of those TSVs must be taken into account.

Among various peripheral circuits, control logic (including page buffer) and charge pumps tend to dominate the silicon area. Meanwhile, the number of TSVs induced by sharing control logic and charge pumps is slightly larger than the page size. Given the modest value of page size (i.e., up to 4 Kbyte) and the feasibility of fabricating TSVs with 4  $\mu$ m pitch [1] (corresponding to a density of 64 K TSVs/mm<sup>2</sup>), sharing control logic and charge pumps may result in TSVs with less than 1 mm<sup>2</sup> of silicon area that can be much less than the area occupied by control logic and charge pumps themselves. On the other hand, sharing row decoders may not be justifiable since row decoders occupy a relatively small silicon area but drive a large number of word-lines in Flash cell array. For example, considering a 4 Gb flash die with 4 Kbyte page size, row decoders must drive 128 K word-lines. The silicon area occupied by 128 K TSVs may be comparable to or even larger than that of row decoders themselves. Therefore, we expect that 3-D integrated Flash memories should share control logic and charge pumps as much as possible and keep row decoders on each flash die.

As one of the most straightforward ways to leverage 3-D integration, the above peripheral circuit sharing may gain reasonable silicon area reduction without any significant system design changes. Certainly, such *sharing* concept can be expanded further to enable significant changes of Flash memory system design, which might be too expensive in conventional 2-D design for improving overall system performance. In the remainder of this section, we present one such possible scenario from ML NAND Flash ECC design perspective.

#### B. ECC-Active ML NAND Flash Design

In conventional ML NAND Flash memory design, ECC plays a passive role in a sense that designers endeavor to push the raw error rate of memory cores as low as possible and use ECC to handle the residual *minimized* raw unreliability. Motivated by the great success of powerful ECC in modern data communication and storage systems, we expect that better system design solutions are achievable by jointly considering the design of Flash memory core and ECC. Under this new ECC-active design paradigm, as illustrated in Fig. 2, ECC handles an *optimized* raw unreliability where the optimization is carried out subject to specific metrics such as storage capacity, read/programming throughput and latency, and/or retention lifetime.

This ECC-active design paradigm most likely favors ECC that are (much) stronger than what is being used today. As a result, the ECC-induced implementation overhead can severely limit its use in conventional 2-D design. In 3-D integrated ML NAND Flash memories, ECC can be shared among all the Flash dies, which greatly alleviates the implementation overhead issue and hence warrants a serious consideration of such ECC-active design paradigm. In this paper, using Bose–Chaudhuri–Hocquenghem (BCH) codes as ECC, we demonstrate the effectiveness of this ECC-active design paradigm for 3-D integrated Flash memories. The objective here is to maximize the effective Flash memory storage density subject to a prespecified Flash memory programming time.



Fig. 2. Switch from ECC-passive to ECC-active design paradigm.

Vol. 97, No. 1, January 2009 | PROCEEDINGS OF THE IEEE 163

Authorized licensed use limited to: Rensselaer Polytechnic Institute. Downloaded on August 06,2010 at 20:12:39 UTC from IEEE Xplore. Restrictions apply.



Fig. 3. Schematic of program-and-verify programming for I levels/cell NAND Flash memories.

1) Background and Key Assumptions: ML Flash memory programming is realized through sequential program-and-verify with a staircase  $V_{pp}$  ramp of programming voltage as illustrated in Fig. 3. The width of each programming voltage pulse is denoted as  $T_p$ . Let l denote the number of storage levels per cell: read circuits in ML NAND Flash memories take l - 1 cycles to finish one read/ verification. Let  $T_v$  denote the period of one read/ verification cycle: each read/verification operation takes  $(l-1) \cdot T_v$ , as illustrated in Fig. 3. The number of program-and-verify cycles required by Flash memory programming is proportional to  $1/V_{pp}$ . Therefore, letting  $T_{prog}$  denote the programming time, we have

$$T_{\rm prog} \propto \frac{T_p + (l-1) \cdot T_{\nu}}{V_{pp}}.$$
 (1)

Meanwhile, the tightness of memory cell threshold voltage window is proportional to  $V_{pp}$ . It is well known that Flash memory cell threshold voltage can be affected by many factors such as floating-gate interference and source-line noise. In this paper, we use Gaussian distribution to model the variation of Flash cell threshold voltage distribution and assume the corresponding threshold voltage distribution standard deviation is proportional to  $V_{pp}$ .

We use a four-level/cell memory with 4 Kbyte page size as a baseline. Fig. 4 shows the memory cell threshold voltage distribution model, where the standard variation is



Fig. 4. The threshold voltage distribution model with the standard deviation of 0.16 V in the baseline four levels/cell memory.

0.16 V (i.e.,  $\pm 3\sigma$  of 0.96 V) and the voltage difference between the means of the two outer distributions is 6.2 V. Its raw bit error rate is only  $3.97 \times 10^{-11}$ ; hence we may consider this memory is optimized under the ECC-passive design paradigm. With the target page error rate of lower than  $10^{-14}$ , a 2-bit error correcting BCH code is sufficient in this context.

2) ECC-Active Memory System Design: The ECC-active design paradigm may be used to greatly improve memory storage density in 3-D integrated memory while maintaining the same memory programming timing, compared with the above baseline four levels/cell memory. We intend to increase the effective storage density by increasing the number of storage levels per cell *l* from four to five, six, seven, and eight. We store nine bits per four cells for l = 5, five bits per two cells for l = 6, 11 bits per four cells for l = 7, and three bits per cell for l = 8, respectively. Let  $V_{pp}^{(l)}$  denote the programming voltage staircase ramp in the *l* levels/cell memory. In order to maintain the same memory programming time as the baseline four levels/cell memory, based on (1) and assuming  $T_p = 24 \ \mu s$  and  $T_v = 3 \ \mu s$  [12], we have

$$V_{pp}^{(4)}: V_{pp}^{(5)}: V_{pp}^{(6)}: V_{pp}^{(7)}: V_{pp}^{(8)} = 33:36:39:42:45.$$

Let  $\sigma^{(l)}$  denote the memory cell threshold voltage distribution standard deviation of the *l* levels/cell memory. Because  $\sigma^{(l)}$  is proportional to  $V_{pp}^{(l)}$  and  $\sigma^{(4)} = 0.16$  V in the baseline four levels/cell memory, we have  $\sigma^{(5)} = 0.17$  V (with  $\pm 3\sigma^{(5)}$  of 1.02 V),  $\sigma^{(6)} = 0.19$  V (with  $\pm 3\sigma^{(6)}$  of 1.14 V),  $\sigma^{(7)} = 0.20$  V (with  $\pm 3\sigma^{(7)}$  of 1.20 V), and  $\sigma^{(8)} = 0.22$  V (with  $\pm 3\sigma^{(8)}$  of 1.32 V). Fig. 5 shows the corresponding memory cell threshold voltage distributions.

Accordingly, the raw bit error rates are  $3.20 \times 10^{-6}(l=5)$ ,  $3.47 \times 10^{-4}(l=6)$ ,  $3.48 \times 10^{-3}(l=7)$ , and  $1.24 \times 10^{-2}(l=8)$ . With the page size of 4 Kbyte user data and a target page error rate of lower than  $10^{-14}$ , binary BCH codes are constructed by shortening primitive binary BCH codes under GF(2<sup>16</sup>). Table 1 lists the BCH codes parameters and the corresponding page error rates, as well as the effective user bits stored in each cell.

We further evaluate the BCH decoder implementation silicon cost using Synopsys tools and 0.13  $\mu$ m complementary metal-oxide-semiconductor (CMOS) standard cell and SRAM libraries. Fig. 6 shows the BCH code decoder structure, in which the inversion-free Berlekamp-Massey (BM) algorithm [13] is used to solve the key equation. For detailed discussion on BCH code decoding, readers are referred to [14] and [15]. To improve the decoding speed, both the syndrome computation and Chien search blocks have a parallelism factor of four (i.e., each clock cycle the syndrome computation block receives four input bits and Chien search block generates four

## **164** PROCEEDINGS OF THE IEEE | Vol. 97, No. 1, January 2009

Authorized licensed use limited to: Rensselaer Polytechnic Institute. Downloaded on August 06,2010 at 20:12:39 UTC from IEEE Xplore. Restrictions apply.



Fig. 5. The memory cell threshold voltage distribution of (a) I = 5, (b) I = 6, (c) I = 7, and (d) I = 8.

output bits). Hence the latency of both blocks is  $\lceil n/4 \rceil$  clock cycles. For a *t*-error correcting BCH code, a fully serial realization of the key equation solver takes t(t+3)/2 clock cycles. For different BCH codes listed in Table 1, we set the parallelism factor of the key equation solver in such a way that its latency is less than  $\lceil n/4 \rceil/10$  clock cycles. The decoder first-in first-out buffer as shown in Fig. 6 is realized by SRAMs to minimize the silicon area cost. Table 2 shows the silicon area and decoding latency when the decoders operate at 400 MHz.

Based on the results listed in Tables 1 and 2, the optimum design solution can be examined assuming the NAND Flash memory cells are fabricated at 70 nm technology node and each flash die stores 8 Gb user data. The effective NAND Flash memory cell size is 0.024  $\mu$ m<sup>2</sup> at 70-nm CMOS technology [9]. Therefore, the memory cell array in each flash die occupies 103.18 mm<sup>2</sup> (l = 4), 91.98 mm<sup>2</sup> (l = 5), 84.35 mm<sup>2</sup> (l = 6), 83.20 mm<sup>2</sup> (l = 7), and 93.55 mm<sup>2</sup> (l = 8). Assuming the BCH decoders are still implemented using 0.13  $\mu$ m technology, we may draw a conclusion that the design with l = 7 may have the minimum silicon area if the number of Flash dies is larger

Table 1 BCH Code Parameters and Performance

| l | (n,k,t) BCH Codes   | Page Error<br>Rate     | User Bits per cell |
|---|---------------------|------------------------|--------------------|
| 4 | (32799, 32768, 2)   | $3.68 \times 10^{-19}$ | 1.998              |
| 5 | (32895, 32768, 8)   | $3.94 \times 10^{-15}$ | 2.242              |
| 6 | (33519, 32768, 47)  | $1.36 \times 10^{-15}$ | 2.457              |
| 7 | (36367, 32768, 225) | $1.07 \times 10^{-15}$ | 2.539              |
| 8 | (44607, 32768, 740) | $6.38 \times 10^{-15}$ | 2.338              |

than six; otherwise the design with l = 6 will be the best choice. Both designs are clearly superior to the baseline four levels/cell design. Finally, it should be pointed out that we use BCH codes as ECC simply for the purpose of demonstration and by no means suggest BCH codes are the best choice. Further studies on various ECC schemes in this context certainly are a rewarding and interesting research direction.

3) Integration With Defect Tolerance: The above study is subject to the constraint that the memory programming time always remains the same for various l. If this constraint can be relaxed to certain extent, the design solutions using strong BCH codes may provide a good potential for realizing strong defect tolerance. For one page with defective memory cells, if we could reduce the programming voltage staircase ramp  $V_{pp}$  and hence improve the raw storage reliability, a certain degree of BCH code error-correction capability would be available for compensating those defects in present page. Meanwhile, the reduced  $V_{pp}$  results in increased memory programming time. Let  $d_{def}$  denote the number of defective cells that can be compensated in one page, Table 3 lists the

Table 2 BCH Decoder Silicon Implementation Metrics at 0.13  $\mu m$  Node

| l | (n, k, t) BCH Codes | Area $(mm^2)$ | Latency $(\mu s)$ |
|---|---------------------|---------------|-------------------|
| 4 | (32799, 32768, 2)   | 0.26          | 41.0              |
| 5 | (32895, 32768, 8)   | 0.38          | 41.3              |
| 6 | (33519, 32768, 47)  | 1.29          | 44.8              |
| 7 | (36367, 32768, 225) | 7.83          | 48.8              |
| 8 | (44607, 32768, 740) | 41.75         | 57.5              |

Vol. 97, No. 1, January 2009 | PROCEEDINGS OF THE IEEE 165



tradeoffs between programming time and defect tolerance for the BCH codes listed in Table 1.

Therefore, 3-D integration may further enable a new ML NAND Flash memory defect tolerance strategy by combining the conventional spare rows/columns repair and strong BCH codes. As illustrated in Fig. 7, one can first check whether the available spare rows/columns can repair all the defects in one block. If not, a certain repair algorithm is carried out to use the spare rows/columns to repair as many defects as possible so that the number of residual defective cells can be minimized. Then one calculates how to adjust  $V_{pp}$  to sufficiently improve the raw storage reliability. Finally, the feasibility of such adjustment is checked, subject to some practical constraints such as circuit precision and minimum allowable programming time.

## **III. 3-D POWER DELIVERY**

This section aims to study the application of 3-D integration technology to power delivery systems. Relevant

Table 3 Tradeoff Between Programming Time and Defect Tolerance

| l | (n, k, t) BCH Codes | Programming Time Increase | $d_{def}$ |
|---|---------------------|---------------------------|-----------|
| 5 | (32895, 32768, 8)   | 14.3%                     | 1         |
| 6 | (33519, 32768, 47)  | 6.7%                      | 5         |
|   |                     | 14.3%                     | 9         |
| 7 | (36367, 32768, 225) | 6.7%                      | 25        |
|   |                     | 14.3%                     | 41        |
| 8 | (44607, 32768, 740) | 6.7%                      | 70        |
|   |                     | 14.3%                     | 129       |



Fig. 7. Flow diagram of using strong BCH codes for defect tolerance.

challenges with conventional power delivery are reviewed, including power density, interconnect parasitics, passives, and packaging. A 3-D power delivery scheme is introduced based on a wafer-level 3-D integration platform, offering a number of advantages compared to conventional power delivery. Several design aspects are examined, particularly with emphasis on the cellular architecture of a power regulator array, vertical partitioning of circuit modules, and passive integration. Issues associated with 3-D power delivery are addressed.

## A. Conventional Power Delivery Challenges

Power delivery has become a major issue in deep submicrometer very large-scale integration designs due to the rapid increase in power density and clock frequency. Take the Intel Dual-Core Xeon processor as an example: it has a thermal design power of 150 W at 1.25 V core supply and 3.4 GHz top frequency [16]. Generally, this high power leads to a total current on the order of 100 A with rapid changes over short time periods (di/dt events). Methods for power saving, such as clock gating, reduce the average power but further increase the di/dt ratio.

Conventional power delivery methods for microprocessors, as shown in Fig. 8, have fundamental limitations in meeting the power requirements of future IC technologies [17], [18]. Usually, this power conversion system employs a voltage regulator module (VRM) mounted on the motherboard to transfer the power into the required supply rails. One power delivery challenge is to maximize the efficiency of the VRM. In general, power efficiency of the power conversion system  $\eta$  is defined as the output power  $P_{\rm out}$  over the input power that is the sum of power loss  $P_{\rm loss}$  and output power

$$\eta = P_{\rm out} / (P_{\rm out} + P_{\rm loss}).$$
<sup>(2)</sup>

Besides the power losses from the control and gate drive circuits, which are generally small, the power loss of power conversion system is the sum of MOSFET power loss and filter power loss

$$P_{\text{loss}} = P_{\text{conduction}} + P_{\text{switching}} + P_{\text{ind}} + P_{\text{cap}}.$$
 (3)

# **166** PROCEEDINGS OF THE IEEE | Vol. 97, No. 1, January 2009



Fig. 8. Conventional power delivery system.

The MOSFET power loss consists of conduction power loss  $P_{\text{conduction}}$  and dynamic switching loss  $P_{\text{switching}}$ . Conduction power loss is the power dissipated in the series resistance of the MOSFET operating in the active region. Dynamic switching loss is the power dissipated each switching cycle and is a function of capacitance, supply voltage, and switching frequency. The filter power loss includes the losses  $P_{\text{ind}}$  and  $P_{\text{cap}}$  from the inductor and capacitor, respectively.  $P_{\text{ind}}$  is due to the conductor series resistance and the stray capacitance.  $P_{\text{cap}}$  is due to the capacitor effective series resistance and is usually small compared to other loss mechanisms.

A chip power delivery system has to maintain steady power and ground rails during core switching [19]. With the processor CMOS device down-scaling, supply voltage decreases and supply current increases, and VRM switching frequency increases for wide bandwidth control. This results in higher power losses in  $P_{\text{conduction}}$ ,  $P_{\text{switching}}$ , and  $P_{\text{ind}}$ , making high-power conversion efficiency more difficult to realize. Moreover, the sensitivity of ICs to voltage fluctuations increases with the decrease of supply voltage. There are three more challenges related to conventional power delivery system.

1) Pin Counts and Packaging: For high-performance applications such as a microprocessor, packaging pin count has grown more rapidly because higher frequency, higher power density and lower supply voltage demand more power and ground pins to meet supply voltage fluctuation, high current and high current slew rate, and signal integrity requirements. According to ITRS 2007 [20], for high-performance application, the minimum cost per pin for contract assembly (cents/pin) reduces slightly from 1.71 to 1.51 cents/pin during 2007 to 2013, while the number of pin counts increases from 3371 to 4736 over the same time-frame. Consequently, total pin cost increases 1.24 times, while the chip size increases 1.13 times due to the increase of pin counts. The majority of pins are for power and ground [16]. Pin count increase also leads to routing complexity, which prolongs design cycles with increasing verification time and makes system debug even more arduous and costly.

2) Long Interconnect Parasitics: Conventional power delivery architecture has long interconnects between the

power conversion system and the IC chip. The parasitics of the long interconnect leads to voltage drop and generates large di/dt noise. The latter forces the use of a large number of decoupling capacitors at various locations along the power delivery path. As the supply voltage continues to drop and the current slew rate keeps increasing, these parasitics become a critical design constraint and cannot meet future needs.

3) Passives: The output low-pass filter of a power conversion system, composed of an inductor and a capacitor, influences the efficiency of power delivery to a large extent when integrated on-chip. Large capacitors and high-value inductors consume large area, and the influence of losses related to passive components becomes significant, especially for the low-voltage high-current power distribution network [21]. For instance, an on-chip inductor typically results in an increased area, lower yield, and higher expense for an active die [22], although inductor size can be reduced at higher switching frequency. Moreover, the performance of on-chip inductors typically suffers from substrate loss due to the close proximity of the devices to a conductive silicon substrate.

In summary, solutions are needed to minimize parasitic resistance, inductance, and capacitance from the power conversion system to the on-chip power grid; increasing input voltage, moving the power regulator closer to the integrated system, or using on-chip inductors operating at higher frequency can alleviate the problems. These solutions have opened the possibility of integrating the power delivery system on a single silicon chip [23], [24] or in a 3-D chip stack [3], [4], [25]–[27].

## B. 3-D Power Delivery and Advantages

Several vertical power delivery integration approaches were proposed [11]–[17], such as vertically stacked dc–dc converters on a processor using flip-chip and through-hole packaging [25], z-axis quasi-vertical power delivery with a conventional discrete VRM [27], and a vertically packaged switched-mode power converter [28]. We proposed a 3-D architecture for power delivery to microprocessors and ASICs using monolithic wafer-level 3-D integration technology [3], [4].



Fig. 9. Examples of bonded interfaces for via-first and via-last 3-D integration platforms: (a) cross-section of bonded CMOS SOI wafer using BCB (for via-last 3-D) [31]; (b) cross-section of bonded damascene patterned Cu/BCB wafers (for via-first 3-D) [32].

Monolithic wafer-level 3-D integration technology uses two fully processed wafers aligned face-to-face or face-to-back [29]–[32] and bonded using either a dielectric adhesive [Fig. 9(a) [31]] or metal/adhesive damascenepatterned layer [Fig. 9(b) [32]] under conditions compatible with CMOS processing. Wafer-level 3-D integration provides the least interconnect parasitics and low cost in high-quantity production with high reliability due to monolithic IC-type interconnectivity, enabling 3-D integration of the power delivery system with memory and processors in one 3-D stack.

Fig. 10 represents a 3-D power delivery system design. VRM on a thinned substrate in the top stratum is vertically integrated with the processor using wafer-to-wafer bonding and TSV, with a layer of interstratum inductors and capacitors in between. The VRM accepts a high input voltage and generates different supply voltages required by different 3-D integrated systems such as microprocessors, ASICs, and I/Os from several volts to below 1 V. The high-performance interstratum inductors and capacitors can be fabricated separately with their optimized materials and processing; thus magnetic thin-film and high-k materials can be used to reduce the size of passive components and improve the performance.



Fig. 10. Schematic of a 3-D power delivery system with passives included between bonded wafers in three strata.

168 PROCEEDINGS OF THE IEEE | Vol. 97, No. 1, January 2009

Compared to existing conventional power delivery architectures, 3-D power delivery offers a number of advantages for high-performance systems:

- minimum interconnect parasitics between VRM and integrated systems with interstratum vias of several micrometers long;
- wide control bandwidth of high-switching frequency power delivery system, enabling dynamically scalable voltages;
- heterogeneous integration of high-performance interstratum inductors and capacitors;
- cellular design of an on-chip power regulator array, enabling generation and distribution of multiple, individually regulated supply voltages for multicore and multifunction applications;
- significantly reduced power/ground package pin counts (I/Os);
- low cost and high reliability due to monolithic IC-type interconnectivity and tolerance of less efficient voltage regulators.

# C. Design Methodologies of 3-D Power Delivery

Three-dimensional integration of power supplies with microprocessors using 3-D IC technologies as discussed above requires high-density VRM occupying a footprint area similar to the processor. Since only a limited amount of capacitance can be incorporated into the 3-D design, the power conversion system must also be able to maintain output voltage regulation in face of rapid load current changes without requiring a large output filter capacitance. Both requirements point to high-performance and fully monolithic power conversion systems with control bandwidth far beyond today's discrete designs. The feasibility and performance of such monolithic power conversion system for 3-D power delivery was demonstrated with a fully integrated, two-phase interleaved buck converter with linear feedback control in a 180 nm SiGe bipolar CMOS process [3], [4]. The converter operates at around 200 MHz switching frequency and achieves a control bandwidth of about 10 MHz. All passives, including the inductor and output filter capacitor, are monolithically integrated on the same chip as the power switches and the



Fig. 11. Top view of cellular design of power regulator array die.

control circuitry. In the following, we discuss the design considerations for 3-D power delivery architecture, partitioning, passives, and power efficiency.

1) Horizontal Arrangement of Unit Power Cells: A cellular architecture is envisioned to increase the flexibility and dynamic control of output voltage, enabling dynamic power supplies down to the functional circuit level, as shown in Fig. 11. Different voltages can be supplied with different cells, as often required in a range of  $1 \sim 5$  V or lower for powering microprocessors, memory, ASICs, I/Os, and different voltages on the processor. This cellular architecture can also provide different currents with separate power/ground rails. The input passives can be shared by adjacent power cells to reduce cell areas. Dynamic reconfiguration of cell connections may also be developed to dynamically allocate the number of cells associated with the generation of certain voltages in accordance with the required amount of current.

2) Vertical Partitioning of System Modules: Fig. 10 shows one of possible vertical partitions of system modules, where the passives (i.e., capacitors and inductors) are fabricated on a separate wafer. Other partitions are also possible. The goal of the vertical partitioning is to optimize the improved electrical performance, i.e., less interconnect parasitics, offered by a high density of interdie/ interwafer vias, and accommodate new constraints in via density, testability, and thermal design. For instance, consider the Intel future terascale research chip-an assembly of 80 core microprocessor array arranged in a 2-D mesh with 3-D memory stacking dedicated to each core unit [33]. With conventional power delivery, a large number of TSVs though the memory chip would be needed to supply the power to the processor as the majority of the package pins are for power/ground. With 3-D power delivery for such an application, in order to reduce the number of power/ground vias through the memory chip, unit power cells (with higher input voltage, and therefore less TSVs) and passives might have to be inserted between the memory chip and processor chip, as well as on top of the memory chip. Moreover, the cellular architecture of VRMs can provide dynamic power to each processor core.

For 3-D integration of different functional components and technologies, such as wireless/mixed signal, memory and processors for different purposes, integration and partitioning of the power delivery circuitry into the 3-D stack are even essential. This is because different functional components may be fabricated using different technologies or technology nodes. For instance, analog circuits are usually fabricated a few technology nodes behind that for processors, and the perception is that deep submicrometer technology might not be the right choice for certain analog circuits. Therefore, integration of multiple power cells into the 3-D stack can supply the needed multiple and individually regulated supply voltages while reducing the number of power/ground TSVs. In general, VRM, passive devices, and 3-D highly integrated systems should be codesigned and partitioned in order to fully take the advantage of 3-D integration.

3) Integrated Passives: In order to meet future voltage regulation requirements during transients, more output filter and decoupling capacitors will be needed [34]. Magnetic thin-film and high-k materials can be used to reduce the size of passive components and improve their performance because 3-D allows interstratum inductors and capacitors to be fabricated separately with optimized materials and processing.

For inductors, increasing the switching frequency is a viable way to decrease the size of passive components for on-chip integration, although higher switching frequency results in switching and gate charge/discharge losses and unwanted in-band interference, thus affecting the power efficiency and generating noises. Using a low-loss substrate, magnetic materials and thicker winding metal can improve the power efficiency with increased L/R ratio of the interstratum inductor, where L is the inductance and R is the series resistance. The power loss of an inductor is given by

$$P_{\rm ind} = \left(I_L^2 + \frac{D^2(1-D)^2 V_{\rm in}^2}{3f_s^2 L^2}\right) R \tag{4}$$

where  $I_L$ , D,  $V_{in}$ , and  $f_s$  are load current, duty cycle, input voltage, and switching frequency, respectively. For an onchip fully integrated spiral inductor, parasitic resistance increases dramatically with the increase of inductance; thus it is difficult to cut down the inductor power loss. Threedimensional integration provides another dimension for the inductor with reduced resistance, area, and power loss, because 1) magnetic materials can be used to increase *L* without concerns over processing contamination and thermal budget, 2) thick metal winding wire can be fabricated to reduce *R*, and 3) structures such as solenoid or toroid can be introduced to further increase *L*, decrease area, and reduce the magnetic leakage.

## D. Potential Issues With 3-D Power Delivery

Although power delivery using 3-D integration is promising, potential issues associated with efficient circuit

## Vol. 97, No. 1, January 2009 | PROCEEDINGS OF THE IEEE 169

Authorized licensed use limited to: Rensselaer Polytechnic Institute. Downloaded on August 06,2010 at 20:12:39 UTC from IEEE Xplore. Restrictions apply.

design and operation should be considered, such as thermal and electromagnetic interference (EMI) issues.

1) Thermal Issue: The thermal problem has already had an impact on the reliability and performance of highperformance 2-D ICs. Many approaches are explored to address the 2-D IC thermal issues. The problem may be aggravated in 3-D ICs because more transistors are packed, resulting in higher power density, and materials (such as on-chip interconnects and bonding materials) between chips have poor thermal conductivity. Design partitioning is a way to address the issues by placing the circuits (e.g., processors), which generate more heat, on the bottom layer directly attached to the heat sink and the circuits (e.g., memories), which generates less heat, on top of the stack. Insertion of thermal TSVs and thermal spreader between layers can also alleviate the thermal issues. The dynamical 3-D power supply with thermal feedback and control with the processor may greatly reduce the overall thermal constraints in 3-D integration.

2) ElectroMagnetic Interference: EMI issues should also be investigated when integrating passive components in a 3-D architecture. Three-dimensional integration might have certain advantages in terms of EMI characteristics because interconnect distances are shortened and TSVs as shielding can be placed around sensitive signal paths to further improve isolation. In addition, a copper metallization grid can be placed on the underlying substrate to provide a Faraday shield beneath the active devices and as a good ground plane for stray field termination. For instance, the damascene-patterned metal/adhesive redistribution layers, which are used for high-density interwafer electrical interconnects and strong adhesive bonding of two wafers in a wafer-level via-first 3-D integration platform [4], can provide some shielding for EMI.

# IV. ON-CHIP 3-D INTEGRATED RF/OPTICAL WIRELESS COMMUNICATION CHIP FOR SENSOR NETWORK

Two wireless communication technologies—RF and freespace optics (FSO)—possess unique characteristics and are complementary to each other. A hybrid RF and FSO wireless communication system of a sensor node offers a number of advantages such as increased bandwidth, higher data transmission rate, lower power consumption, and increased security of the data link. Si is the dominant integration platform at present and probably will remain so in the next decade. Monolithic integrated RF wireless front-end circuits, including voltage-controlled oscillator and antenna on Si substrate, are attractive to designers [35]–[37]. However, the challenge is that 1) the dimension of an RF antenna is usually several orders of magnitude larger than typical transistor circuits and 2) the surface wave tends to confine the electromagnetic energy in the substrate rather than radiating it into air. Resonant antenna structures or high gigahertz carrier frequency have proven effective to greatly reduce antenna dimension [38]. A number of techniques have been developed to reduce the surface wave in the substrate. The typical methods include electromagnetic bandgap structures in the substrate [39], [40], low effective dielectric material underneath the antenna area [41]–[43], and the use of softand-hard surface material [44]. In this context, we choose a quasi-Yagi antenna that is integrated on chip for the RF wireless links because of its compatibility with high dielectric substrate [45].

The miniaturized high-frequency antenna and the monolithic integrated optoelectronic devices make it possible to realize optoelectronic integrated circuits (OEICs) and monolithic microwave integrated circuits (MMICs) on the same chip. However, the challenge in the monolithic approach is similar to the dilemma that system on chip technology is facing. The amplifier circuits of OEIC typically require significantly different epilayer growth profiles from optoelectronic devices. For example, Si CMOS is not compatible for monolithic integration of a light-emitter or a photodetector because Si is an indirect bandgap material and does not respond to wavelengths longer than 1  $\mu$ m. Vertical integration through 3-D wafer bonding is particularly attractive in this aspect because it offers an economic solution for complex circuits that can incorporate devices of significantly different design features and material structures in a super die with an extremely small footprint. Thus, the system architecture can separate the antenna structure and optoelectronic devices from the transceiver circuits as well as the analog sensing chip through vertical interconnects.

In order to explore the 3-D integration more specifically, we considered a design of an RF/optical sensor chip consisting of four die stacked vertically, as shown in Fig. 12. The top die consists of an end-fire quasi-Yagi



**Fig. 12.** Schematic illustration of a 3-D sensor node with hybrid RF/FSO wireless communication capability.

## **170** PROCEEDINGS OF THE IEEE | Vol. 97, No. 1, January 2009

antenna, an array of edge emitting laser diodes, and an array of edge viewing photodetectors on a SiGe substrate. The RF antenna serves as the front end device for the communication through RF channels, while the integrated on-chip laser diodes and photodetectors construct the duplex, high-data-rate wireless optical communication links. The FSO data link and the RF channel are operated independently to allow simultaneous data transmission and receiving for an autonomous sensor network. The second die on the stack is comprised of RF circuits, laser driver, and photodetector amplifier circuits. The circuits are grouped together to complete functions of RF and FSO wireless communication. Through 3-D topology, the RF antenna and optoelectronic devices can be fabricated and optimized separately from their driving circuits in order to achieve their best performance. In addition, this approach is more cost effective due to the shortening in design cycle. The RF and optoelectronic circuits connect to their frontend devices through vertical electrical vias. Compared to a conventional packaging scheme such as wire bonding and flip-chip bonding, the vertical vias through vertical wafer bonding introduce less parasitic capacitance. The third die in the stack is a sensor chip, and the fourth die provides energy harvesting capability-for instance, solar cell or vibration energy generator using microelectromechanical systems techniques. The abovementioned 3-D super die is a hypothetical structure of a sensor node that possesses both RF and FSO communication capabilities. In practice, it is possible or even necessary in some cases to vary the sequence of die stacking and/or regroup the function blocks into different chip layers. In the following text, more detailed analysis will be provided for the top die that consists of the RF antenna and the optoelectronic devices.

# A. RF Antenna Integration With Optoelectronic Devices

The top view of the integrated RF antenna with optoelectronic devices on the top die is shown in Fig. 13.



Fig. 13. A sketch of an on-chip modified quasi-Yagi antenna. The passive elements are replaced by arrays of optical devices.

An array of edge emitting laser diodes are placed in the position of the primary director of the quasi-Yagi antenna, while an array of the edge viewing photodetectors is placed where the secondary director of the antenna is located. The doping concentration and epilayer thickness of SiGe wafer is optimized for the photodetector for long wavelength detection. Edge viewing photodetectors (EVPDs) can be used for optical photoreceiver. The grouping of photodetector with antenna is based on two considerations. First, the metallization of antenna and the EVPDs can be processed simultaneously. Secondly, the epilayer thickness and doping is only critical for the photodetectors; hence the thickness of the photodetector substrate can be tailored for the best performance of the antenna. Compared to MMIC, where the antenna is built on CMOS or bipolar die, the additional benefit to the 3-D approach is to avoid the allocation of a large percentage of the surface area of the CMOS chip to antenna structure.

Monolithic integrated optical components on Si substrate have been actively pursued for decades, but there is no high-performance Si laser or modulator yet close to commercialization [46]. Heterogeneous integration is the most practical solution at present to add optical emitters onto Si-based electronic devices. Both crystalline Si and amorphous Si have been successfully made into commercial photodetectors for detection of light in wavelength of 0.4  $\sim$  1.0  $\mu$ m. Lightwave with  $\lambda = 1.0 \sim 1.6 \ \mu$ m is an important wavelength range because it not only covers the telecommunication wavelengths but also partially lies in the eye-safety spectrum for free-space optics. Ge and III-V compound semiconductors are commonly used for photodetection in this wavelength range. SiGe grown on Si substrate is adopted in this research for monolithic integrated photodetectors.

## B. On-Chip Quasi-Yagi Antenna

Evolved from traditional Yagi–Uda antenna, a quasi-Yagi antenna is suitable for fabrication on a high dielectric substrate. The truncated ground plane reflector on the back side of the antenna utilizes surface waves for the generation of the fundamental mode of electromagnetic wave for radiation. A broadband, compact size quasi-Yagi antenna, realized on Duroid material ( $\epsilon_r = 10.2$ ), is reported to have achieved 3–5 dB absolute gain with nominal efficiency of 93% [45]. Another advantage of a quasi-Yagi antenna is its compact size, which is approximately half of the central operation wavelength. For a carrier at ~30 GHz (Ka-band), the dimension of the antenna is on the order of ~3 mm, which is an attribute crucial for monolithic integration on Si platform.

# C. Heterogeneous Integration of Edge Emitting Laser Diodes

Dense integration is the key to obtaining optical signals with high emitting power for wireless communication. An

## Vol. 97, No. 1, January 2009 | PROCEEDINGS OF THE IEEE 171



Fig. 14. Spatial arrangement of MSM PDs with edge emitting diodes.

array of edge emitting laser diodes can be heterogeneously integrated side-by-side on the SiGe substrate to enable optical signal transmission from the antenna chip. Directbandgap compound semiconductor materials are typically used to fabricate a high-speed laser diode. Compound laser diodes with the substrate removed or partially removed have been demonstrated for heterogeneous integration on a dissimilar substrate [47], [48]. The dimensions of the length and width of antenna director are critical for antenna radiation polarization and direction. At Ka-band, the length and width of the antenna are on the order of 2 and 0.5 mm, respectively, which are larger than a single die laser diode. Thus, an array of laser diodes is placed on the substrate, which has a total length equal to the antenna.

## D. SiGe Edge-Viewing Photodetector Array

The optical wireless receiver consists of an array of EVPDs. The photodetectors can be in the form of waveguide metal-semiconductor-metal (MSM) or PiN photodiode. Different from laser diode integration, the photodetectors are directly fabricated on the SiGe substrate shared with the quasi-Yagi antenna. Dense

integration of EVPD photodetector arrays increases the receiver alignment tolerance considerably. The metal pads of the array of EVPDs are placed in front of the primary antenna director, which acts as the secondary director, as shown in Fig. 14. The laser diodes and the EVPDs are arranged in alternating positions to avoid blocking of the laser emission by the photodetectors. The angle  $2\theta$  is defined as the angle between two adjacent PDs from the laser diode. This angle needs to be larger than the beam divergence from the laser diode.

## E. Analysis of the Antenna/Optical Chip

Numerical simulation of the radiation pattern, return loss, and broadband gain of the quasi-Yagi antenna on SiGe is performed by CST Microwave Studio. The broadband gain of the antenna in the absence of optical elements as well as with the placement of the optical elements on the SiGe die is plotted in Fig. 15. The antenna radiation gain is slightly altered when optical elements are placed in the SiGe substrate. The vertical vias that go between dies have little effect on the antenna radiation as well. This is the result of the optical elements' being electrically small in comparison with the RF electromagnetic wave.

# **V. CONCLUSION**

As stated by Dr. C.-G. Hwang, president and CEO of Samsung Electronics, at the 2006 IEEE International Electron Devices Meeting, "... rapid adoption of 3-D integration technology seems to be essential and, thankfully, unavoidable." We believe that the semiconductor industry will shift to a new paradigm enabled by increasingly mature 3-D integration technologies. This certainly poses many challenges and opportunities for integrated system designers



Fig. 15. Broadband gain for the standard quasi-Yagi antenna (default), optical elements acting as quasi-Yagi director (square-coded line), and antenna configuration with vertical vias (triangle-coded line).

who must embrace this new emerging paradigm and learn how to most effectively exploit its potentials. By presenting three preliminary case studies, we hope this paper may prompt more serious considerations and research efforts from system design perspectives to enable the blossom of 3-D integration era. ■

#### REFERENCES

- R. S. Patti, "Three-dimensional integrated circuits and the future of system-on-chip designs," *Proc. IEEE*, vol. 94, pp. 1214–1224, Jun. 2006.
- [2] J.-Q. Lu, K. Rose, and S. Vitkavage. (2007, Jul.). 3D integration: Why, what, who, when? Future Fab Int., pp. 25–27. [Online]. Available: http://www.future-fab.com/
- [3] J.-Q. Lu, J. Sun, D. Giuliano, and R. J. Gutmann, "3D architecture for power delivery to microprocessors and ASICs," in Proc. 3rd Int. Conf. 3D Architect. Semiconduct. Integr. Packag., 2006.
- [4] J. Sun, J.-Q. Lu, D. Giuliano, T. P. Chow, and R. J. Gutmann, "3D power delivery for microprocessors and high-performance ASICs," in Proc. IEEE Appl. Power Electron. Conf., 2007, pp. 127–133.
- [5] C. Hwang, "Nanotechnology enables a new memory growth model," *Proc. IEEE*, vol. 91, pp. 1765–1771, Nov. 2003.
- [6] R. Micheloni et al., "A 4 Gb 2 b/cell NAND flash memory with embedded 5 b BCH ECC for 36 MB/s system read throughput," in Proc. Int. Solid State Circuits Conf. (ISSCC), 2006.
- [7] K. Imamiya et al., "A 125- mm<sup>2</sup> 1-Gb NAND flash memory with 10-MByte/s program speed," *IEEE J. Solid-State Circuits*, vol. 37, pp. 1493–1501, Nov. 2002.
- [8] J. Lee et al., "A 1.8 V 2 Gb NAND flash memory for mass storage applications," in Proc. IEEE Int. Solid-State Circuits Conf., 2003, p. 290.
- [9] T. Hara et al., "A 146- mm<sup>2</sup> 8-Gb multi-level NAND flash memory with 70-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 41, pp. 161–169, Jan. 2006.
- [10] K. Takeuchi et al., "A 56-nm CMOS 99-mm<sup>2</sup> 8-Gb multi-level NAND flash memory with 10-MB/s program throughput," *IEEE J. Solid-State Circuits*, vol. 42, pp. 219–232, Jan. 2007.
- [11] G. Campardo, R. Micheloni, and D. Novosel, VLSI-Design of Non-Volatile Memories. New York: Springer, 2005.
- [12] H. Nobukata et al., "A 144-Mb, eight-level NAND flash memory with optimized pulsewidth programming," *IEEE J. Solid-State Circuits*, vol. 35, pp. 682–690, May 2000.
- [13] H. O. Burton, "Inversionless decoding of binary BCH codes," *IEEE Trans. Inf. Theory*, vol. IT-17, pp. 464–466, Jul. 1971.
- [14] R. E. Blahut, Algebraic Codes for Data Transmission. Cambridge, U.K.: Cambridge Univ. Press, 2003.
- [15] S. B. Wicker and V. K. Bhargava, Reed-Solomon Codes and Their Applications. New York: IEEE Press, 1994.
- [16] R. Rusu et al., "A 65-nm dual-core multithreaded Xeon processor with 16-MB L3 cache," *IEEE J. Solid-State Circuits*, vol. 42, pp. 17–25, 2007.
- [17] O. P. Mandhana and J. Zhao, "Power delivery system performance optimization of a printed circuit board with multiple microprocessors," in *Proc. Electron. Comp. Technol. Conf.*, 2004, pp. 581–588.

- [18] B. Beker and T. Hirsch, "Tradeoffs in modeling the response of power delivery systems of high-performance microprocessors," in *IEEE Conf. EPEP*, 2000, pp. 77–80.
- [19] X. Wu and C. Desai, Delivering power integrity solutions for advanced IC packages. [Online]. Available: http://www.ansys.com/ products/icemax/Delivering\_Power\_with\_ Confidence.pdf
- [20] Semiconductor Industry Association, The International Technology Roadmap for Semiconductors (ITRS). [Online]. Available: http://www.itrs.net/reports.html
- [21] M. H. F. Lim, Z. Liang, and J. D. van Wyk, "Low profile integratable inductor fabricated based on LTCC technology for microprocessor power delivery applications," in *Proc. IEEE Appl. Power Electron. Conf.*, 2006, p. 7.
- [22] Y. Chen et al., "Design and simulation of on-chip RF magnetic medium inductors," in Proc. Int. Conf. Solid-State Integr. Circuit Technol., 2006, pp. 587–589.
- [23] M. Swaminathan, J. Kim, I. Novak, and J. P. Libous, "Power distribution networks for system-on-package: Status and challenges," *IEEE Trans. Adv. Packag.*, vol. 27, pp. 286–300, 2004.
- [24] G. Schrom et al., "Optimal design of monolithic integrated DC-DC converters," in Proc. IEEE Int. Conf. Integr. Circuit Design Technol., 2006, pp. 1–3.
- [25] G. Schrom et al., "Feasibility of monolithic and vertical-stacked dc-dc converters for microprocessors in 90 nm technology generation," in Proc. Int. Symp. Low Power Electron. Design, 2004, pp. 263–268.
- [26] Y. L. Li et al., "Enhancing power distribution system through vertical integrated models, optimized designs, and switching VRM model," in Proc. Electron. Comp. Technol. Conf., 2000, pp. 272–277.
- [27] J. A. Harrison and E. R. Stanford, "Z-axis processor power delivery system," U.S. Patent 6 523 253.
- [28] S. Chandrasekaran, J. Sun, and V. Mehrotra, "Vertically packaged switched-mode power converter," U.S. Patent 7 012 414.
- [29] J.-Q. Lu et al., "3D integration using wafer bonding," in Proc. Adv. Metal. Conf., 2001, pp. 515–521.
- [30] J.-Q. Lu, T. S. Cale, and R. J. Gutmann, "Wafer-level three-dimensional hyper-integration technology using dielectric adhesive wafer bonding," in Materials for Information Technology: Devices, Interconnects and Packaging, E. Zschech, C. Whelan, and T. Mikolajick, Eds. London, U.K.: Springer-Verlag, Aug. 2005, pp. 386–397.
- [31] R. J. Gutmann et al., "A wafer-level 3D IC technology platform," in Proc. Adv. Metallization Conf., 2003, pp. 19–26.
- [32] J. J. McMahon, J.-Q. Lu, and R. J. Gutmann, "Wafer bonding of damascene-patterned metal/adhesive redistribution layers for via-first three-dimensional (3D) interconnect," in *Proc. IEEE Electron. Comp. Technol. Conf. (ECTC)*, 2005, pp. 331–336.

## Acknowledgment

J.-Q. Lu and G. Zhang would like to acknowledge Prof. R. J. Gutmann, Prof. K. Rose, and Prof. J. Sun for their contributions in 3-D power delivery; and MARCO, DAPRA, and NYSTAR for their support through the Interconnect Focus Center.

- [33] J. Bautista, "Tera-scale computing the role of interconnects in volume compute platforms," in Proc. IEEE Conf. IITC, 2007, pp. 187–189.
- [34] T. Karnik et al., "High-frequency DC-DC conversion: Fact or fiction," in Proc. IEEE Int. Symp. Circuits Syst., 2006, pp. 245–248.
- [35] E. Erdogan et al., "A single-chip quad-band GSM/GPRS transceiver in 0.18 μm standard CMOS," in Proc. Int. Solid-State Circuits Conf., 2004, pp. 318–320.
- [36] B. Razavi, *RF Microelectronics*. Englewood Cliffs, NJ: Prentice-Hall, 1998.
- [37] H. Hashemi et al., "A fully integrated 24 GHz 8-path phased-array receiver in silicon," in Proc. IEEE Int. Solid-State Circuits Conf., 2004, p. 390.
- [38] J.-J. Lin et al., "Integrated antennas on silicon substrates for communication over free space," *IEEE Electron Device Lett.*, vol. 25, pp. 196–198, 2004.
- [39] R. Gonzalo, P. DeMaagt, and M. Sorolla, "Enhanced patch-antenna performance by suppressing surface waves using phonotic-bandgap substrates," *IEEE Trans. Microwave Theory Tech.*, vol. 47, pp. 2131–2138, Nov. 1999.
- [40] J. S. Colburn and Y. Rahmat-Samii, "Patch antennas on externally perforated high dielectric constant substrates," *IEEE Trans. Antennas Propag.*, vol. 47, pp. 1785–1794, Dec. 1999.
- [41] D. R. Jackson, J. T. Williams, and A. K. Bhattacharyya, "Microstrip patch designs that do not excite surface waves," *IEEE Trans. Antennas Propag.*, vol. 41, pp. 1026–1037, Aug. 1993.
- [42] J. Papapolymerou, R. F. Drayton, and L. P. B. Katehi, "Micromachined patch antennas," *IEEE Trans. Antennas Propag.*, vol. 46, pp. 275–283, Feb. 1998.
- [43] R. B. Waterhouse, "Stacked patches using high and low dielectric constant material combinations," *IEEE Trans. Antennas Propag.*, vol. 47, pp. 1767–1771, 1999.
- [44] R. Li, G. DeJean, M. M. Tentzeris, J. Papapolymerou, and J. Laskar, "Fdtd analysis of patch antennas on high dielectric-constant substrates surrounded by by a soft-and-hard surface," *IEEE Trans. Magn.*, vol. 40, pp. 1444–1447, 2004.
- [45] W. R. Deal et al., "A new quasi-Yagi antenna for planar active antenna arrays," *IEEE Trans. Microwave Theory Tech.*, vol. 48, pp. 910–918, 2000.
- [46] G. T. Reed and A. P. Knights, Silicon Photonics: An Introduction. New York: Wiley, 2004.
- [47] H.-F. Kuo et al., "InP/InGaAsP MQW thin film edge emitting lasers for embedded waveguide chip to chip optical interconnections," in Proc. IEEE LEOS Annu. Meeting, 2003, pp. 63–64.
- [48] Z. Huang et al., "Embedded optical interconnections using thin film InGaAs metal-semiconductor-metal photodetector," *Electron. Lett.*, vol. 38, pp. 1708–1709, 2002.

#### ABOUT THE AUTHORS

**Tong Zhang** (Senior Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Xian Jiaotong University, Xian, China, in 1995 and 1998, respectively, and the Ph.D. degree in electrical engineering from the University of Minnesota, Minneapolis, in 2002.

Currently he is an Associate Professor in the Electrical, Computer and Systems Engineering department, Rensselaer Polytechnic Institute, Troy, NY. His current research interests include

algorithm and architecture codesign for communication and data storage systems, 3-D integrated microelectronic system design, variationtolerant signal processing IC design, fault-tolerant system design for digital memory, and interconnect system design for hybrid CMOS/ nanodevice electronic systems.

Prof. Zhang is currently an Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—PART II: ANALOG AND DIGITAL SIGNAL PROCESSING.

**Rino Micheloni** (Senior Member, IEEE) received the Laurea degree (*cum laude*) in nuclear engineering from the Politecnico di Milano, Milan, Italy, in 1994.

In 1995, he joined the Memory Product Group, STMicroelectronics, Agrate, Italy, where he was the Product Development Manager of the NOR MLC Flash products for code and data storage applications. From 2002 to 2006, he led the NAND multilevel Flash activities and the error correction

code development team. In late 2006, he joined Qimonda Flash GmbH, Germany, as Senior Principal for Flash Design. Currently, he is with Qimonda Italy, Vimercate, leading the design center activities. He is the coauthor of more than 20 papers in international journals or conferences. He is coauthor of *VLSI-Design of Non-Volatile Memories* (Berlin, Germany: Springer-Verlag, 2005); *Memories in Wireless Systems* (Milan, Italy: Franco Angeli, 2005); and *Error Correction Codes for Non-Volatile Memories* (Berlin, Germany: Springer-Verlag, 2008). He is author/ coauthor of more than 100 patents (82 granted in the United States).

Mr. Micheloni was Co-Guest Editor for the PROCEEDINGS OF THE IEEE Special Issue on Flash Memory, April 2003. He received the STMicroelectronics Exceptional Patent Award in 2003 and 2004.

**Guoyan Zhang** received the Ph.D. degree in electrical and electronic engineering from Peking University, China, in 2002.

From 2002 to 2005, she was an Associate Professor in Peking University. Then, she was a Research Assistant with the National University of Maynooth, Ireland, and a Postdoctoral Associate with Rensselaer Polytechnic Institute, Troy, NY. She is currently with Sensitron Semiconductor, Long Island, NY. She is mainly working in the area of microelectronics with emphasis on device, circuit design, and 3-D integration. **Zhaoran Rena Huang** received the B.Sc. degree in optoelectronics from Beijing Institute of Technology, China, in 1995, and the M.Sc. and Ph.D. degrees in electrical engineering from the Georgia Institute of Technology, Atlanta, in 1999 and 2003, respectively.

She was a Postdoctoral Fellow with Microsystem Packaging Research Center from January to September 2004, where she participated in research on a prototype demonstration of RF/



opto/digital system-on-package for a miniaturized multifunctional electronic module. She is now an Assistant Professor at Rensselaer Polytechnic Institute, Troy, NY. Her current research interests include RF/optical devices integration, biosensor, and Si photonics.



James Jian-Qiang Lu (Senior Member, IEEE) received the Dr.Rer.Nat. (Ph.D.) degree from the Technical University of Munich, Germany, in 1995.

He is an Associate Professor at Rensselaer Polytechnic Institute (RPI), Troy, NY, where he has been leading research programs of 3-D hyperintegration technology since 1999, with focus on 3-D hyperintegration architecture, platform, and technology development as well as micro-nano-bio interfaces for future chips. He has broad research



experiences from micronanoelectronics theory and design to materials, processing, devices, integration, and packaging (e.g., Si, GaAs, and GaN devices, novel field-effect transistors, terahertz electronics, carbon nanotubes, and Si IC interconnects). He has authored/coauthored more than 170 publications in refereed journals, conferences, or books, and given a number of invited presentations, seminars, and short courses. He was a Technical Chair, Workshop Chair, Session Chair, Panelist, and Panel Moderator for many conferences.

Prof. Lu is a member of several professional societies and of the National Technical Committee (Chair for 3-D Packaging), International Microelectronics and Packaging Society. He has received many awards, including the School of Engineering Research Excellence Award from RPI in 2005 and the IEEE CPMT Exceptional Technical Achievement Award in 2008.