# High-Speed Analog-to-Digital Converters for Broadband Applications 

by

Ayman H. Ismail

A thesis<br>presented to the University of Waterloo in fulfillment of the<br>thesis requirement for the degree of<br>Doctor of Philosophy<br>in<br>Electrical and Computer Engineering

Waterloo, Ontario, Canada, 2007
© Ayman H. Ismail 2007

I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners.

I understand that my thesis may be made electronically available to the public.


#### Abstract

Flash Analog-to-Digital Converters (ADCs), targeting optical communication standards, have been reported in SiGe BiCMOS technology. CMOS implementation of such designs faces two challenges. The first is to achieve a high sampling speed, given the lower gain-bandwidth (lower $f_{t}$ ) of CMOS technology. The second challenge is to handle the wide bandwidth of the input signal with a certain accuracy. Although the first problem can be relaxed by using the time-interleaved architecture, the second problem remains as a main obstacle to CMOS implementation. As a result, the feasibility of the CMOS implementation of ADCs for such applications, or other wide band applications, depends primarily on achieving a very small input capacitance (large bandwidth) at the desired accuracy.

In the flash architecture, the input capacitance is traded off for the achievable accuracy. This tradeoff becomes tighter with technology scaling. An effective way to ease this tradeoff is to use resistive offset averaging. This permits the use of smaller area transistors, leading to a reduction in the ADC input capacitance. In addition, interpolation can be used to decrease the input capacitance of flash ADCs. In an interpolating architecture, the number of ADC input preamplifiers is reduced significantly, and a resistor network interpolates the missing zero-crossings needed for an N-bit conversion. The resistive network also averages out the preamplifiers offsets. Consequently, an interpolating network works also as an averaging network.

The resistor network used for averaging or interpolation causes a systematic nonlinearity at the ADC transfer characteristics edges. The common solution to this problem is to extend the preamplifiers array beyond the input signal voltage range by using dummy preamplifiers. However, this demands a corresponding extension of the flash ADC reference-voltage resistor ladder. Since the voltage headroom of the reference ladder is considered to be a main bottleneck in the implementation of flash ADCs in deep-submicron technologies with reduced supply voltage, extending the reference voltage beyond the input voltage range is highly undesirable.


The principal objective of this thesis is to develop a new circuit technique to enhance
the bandwidth-accuracy product of flash ADCs. Thus, first, a rigorous analysis of flash ADC architectures accuracy-bandwidth tradeoff is presented. It is demonstrated that the interpolating architecture achieves a superior accuracy compared to that of a full flash architecture for the same input capacitance, and hence would lead to a higher bandwidthaccuracy product, especially in deep-submicron technologies that use low power supplies. Also, the gain obtained, when interpolation is employed, is quantified. In addition, the limitations of a previous claim, which suggests that an interpolating architecture is equivalent to an averaging full flash architecture that trades off accuracy for the input capacitance, is presented. Secondly, a termination technique for the averaging/interpolation network of flash ADC preamplifiers is devised. The proposed technique maintains the linearity of the ADC at the transfer characteristics edges and cancels out the over-range voltage, consumed by the dummy preamplifiers. This makes flash ADCs more amenable for integration in deep-submicron CMOS technologies. In addition, the elimination of this over-range voltage allows a larger least-significant bit. As a result, a higher input referred offset is tolerated, and a significant reductions in the ADC input capacitance and power dissipation are achieved at the same accuracy. Unlike a previous solution, the proposed technique does not introduce negative transconductance at flash ADC preamplifiers array edges. As a result, the offset averaging technique can be used efficiently.

To prove the resulting saving in the ADC input capacitance and power dissipation that is attained by the proposed termination technique, a 6 -bit $1.6-\mathrm{GS} / \mathrm{s}$ flash ADC test chip is designed and implemented in $0.13-\mu \mathrm{m}$ CMOS technology. The ADC consumes 180 mW from a $1.5-\mathrm{V}$ supply and achieves a Signal-to-Noise-plus-Distortion Ratio (SNDR) of 34.5 dB and 30 dB at $50-\mathrm{MHz}$ and $1450-\mathrm{MHz}$ input signal frequency, respectively. The measured peak Integral-Non-Linearity (INL) and Differential-Non-Linearity (DNL) are 0.42 LSB and 0.49 LSB , respectively.

## Acknowledgements

All Praise is due to Allah, Most Gracious, Most Merciful, Whose help and guidance is ever dominating throughout my life.

I would like express to my gratitude to my supervisor, Prof. M. I. Elmasry for his continuous support, guidance, and generosity in funding my research work that allowed me to build a complete high-speed testing setup in the VLSI lab of the University of Waterloo. I would like, also, to express my deepest thanks to Prof. David Nairn for accepting to co-supervise my research work in its last stages. Having him as a co-supervisor was both a source of pleasure and honour to me.

The work of this thesis was enriched by a lot technical discussions that I had with many people. In particular, I would like to thank Mohamed El-Said for his help with high-speed design issues and modeling, and Mohamed Nummer for his tremendous technical support when time came for testing setup preparation. My deep appreciation to all my colleagues in the VLSI lab, in particular, Hassan Hassan who never failed to solve my PC problems, and who's presence in lab always spread joy among all students.

Special thanks to Krish Nagaraj, Mohamed Kamal, and Neeraj Nayak from Texas Instruments who provided me with industry experience and made my stay at Texas Instruments a useful and enjoyable experience. I would like also to thank Bill Jolley from the CIRFE of the University of Waterloo for wirebonding my chips.

My deepest gratitude to my mother and father for their continuous encouragement, and prayers. I will never be able to pay back what they did for me. My deepest thanks to my wife for her strong support during my PhD studies, and her patience when I had to work restless during too many weekends before chip submission deadline and thesis submission.

## Contents

1 Introduction ..... 1
1.1 Thesis Contributions ..... 3
1.2 Thesis Organization ..... 5
2 High-Speed Analog-to-Digital Converters ..... 6
2.1 Analog-to-Digital Converters' Performance Metrics ..... 6
2.1.1 DC Specifications ..... 7
2.1.2 Dynamic Specifications ..... 9
2.2 Comparators ..... 10
2.3 The Flash Analog-to-Digital Converters ..... 12
2.3.1 The Interpolating Flash Architecture ..... 17
2.3.2 Capacitive Interpolation and Capacitive Generation of Reference Volt- ages ..... 20
2.3.3 The Folding Architecture ..... 20
2.3.4 The Folding-Interpolating Architecture ..... 24
2.3.5 Calibration of Flash ADCs ..... 26
2.4 The Two-Step Analog-to-Digital Converters ..... 29
2.5 The Two-step Subranging Analog-to-Digital Converters ..... 30
2.6 Pipelined Analog-to-Digital Converters ..... 31
2.7 Analog-to-Digital Converters Figures of Merit ..... 32
3 Flash ADC Design for a Wide Bandwidth ..... 36
3.1 The Bandwidth-Accuracy Tradeoff of Flash ADCs ..... 36
3.1.1 Time-Interleaving of Flash ADCs ..... 43
3.2 Analysis of the Interpolating Architecture ..... 44
3.2.1 The $\times 2$ Interpolating Architecture ..... 44
3.2.2 Architectures with a Higher Interpolation Factor ..... 54
3.3 Preamplifiers Effective Gain in an Interpolating Architecture ..... 55
4 Coping with the Lower Supply Voltages in Deep-Submicron Technologies ..... 60
4.1 Previous Solutions ..... 60
4.1.1 Reducing the Over-Range Voltage Headroom by Altering Averaging Resistors Value ..... 60
4.1.2 Over-Range Voltage Headroom Elimination by Triple Cross-Connection ..... 63
4.2 The Proposed Termination Technique ..... 66
4.2.1 Concept ..... 66
4.2.2 Circuit Level Implementation ..... 68
5 A 6-bit 1.6-GS/s Low Power Broadband Flash ADC Converter in 0.13- $\mu \mathrm{m}$ CMOS Technology ..... 72
5.1 The Analog Front End ..... 72
5.1.1 The Track-and-Hold Circuit ..... 74
5.1.2 The Reference Ladder ..... 76
5.1.3 Preamplification Stages ..... 78
5.2 The Digital Back End ..... 84
5.2.1 Comparators and Latches ..... 86
5.2.2 The Digital Logic ..... 90
6 Measurements ..... 94
6.1 Testing Setup ..... 94
6.2 Measurement Results ..... 97
7 Conclusions ..... 109
A The Impulse Response of a $\times 2$ Interpolating Network Treated as a Spatial Filter ..... 112

## List of Tables

5.1 Voltage gain, $3-\mathrm{dB}$ bandwidth, input referred offset, and offset reduction factor of each preamplifiers stage. ..... 82
6.1 ADC performance summary. ..... 103
6.2 6-bit ADCs comparison ..... 106

## List of Figures

2.1 Analog-to-digital conversion. ..... 7
2.2 Offset generation mechanisms in a typical comparator. ..... 11
2.3 Offset cancellation techniques:(a) input offset storage, (b) output offset stor- age, and (C) multistage offset cancelation. ..... 13
2.4 The full-flash architecture. ..... 15
2.5 Averaging the output of preamplifiers. ..... 17
2.6 The interpolating architecture. Interpolation factor $=4$. ..... 18
2.7 Preamplifiers' transfer characteristics and interpolated transfer character- istics for two cases: (a)preamplifier's linear range does not extend to the zero-crossing of the neighbouring preamplifier, and (b)preamplifier's linear range extends to the zero-crossing neighbouring of the other pre-amplifier. ..... 19
2.8 Capacitive interpolation. ..... 21
2.9 The folder transfer characteristics. ..... 22
2.10 The folding architecture, folding factor $=4$. ..... 23
2.11 A CMOS folding circuit. ..... 24
2.12 The folding interpolating architecture. ..... 25
2.13 The cascaded Folding interpolating architecture. ..... 26
2.14 Foreground calibration of comparators used in flash ADCs: (a) applying current at the input, (b) applying current at the output, and (c) varying output capacitance. ..... 28
2.15 Background calibration of comparator offset. ..... 29
2.16 The two-step ADC architecture ..... 30
2.17 The two-step subranging architecture. ..... 31
2.18 Conceptual block diagram of pipelined ADC. ..... 32
2.19 The ADC Figure of Merit $\mathrm{FOM}_{1}$ as a function in technology feature size. ..... 34
2.20 The ADC Figure of Merit $\mathrm{FOM}_{2}$ as a function in technology feature size. ..... 35
3.1 Preamplifiers array edge problem: (a) transfer characteristics of preampli- fiers, and (b) INL vs. output code. ..... 38
3.2 Connecting the outputs of preamplifiers to average-out offset. ..... 39
3.3 Voltage headroom distribution. ..... 40
$3.4\left(\frac{A_{V t}^{2} C_{o x}}{V_{D D}^{2}}\right)$ vs. technology minimum feature size $(L)$. ..... 41
3.5 The highest reported ERBW for 6 -bit single channel flash ADCs in different technologies. ..... 42
3.6 (a) The array of preamplifiers and averaging network of a $\times 2$ interpolating architecture modeled as a spatial filter. (b) Impulse response calculation. ..... 46
3.7 Current stimulus due to the input signal to the spatial filter formed of the interpolating network. ..... 48
3.8 Offset reduction ratios $\xi_{\text {Flash }}$ and $\xi_{\times 2 I T P L}$ vs. $\frac{R_{1}}{R_{0}}$ for $W_{\text {Lin }}=17$. ..... 50
$3.9 m \xi_{\text {Flash }}^{2}$ and $m \xi_{\times 2 I T P L}^{2}$ vs. $\frac{R_{1}}{R_{0}}$ for $W_{\text {Lin }}=17 . m=1$ for a full-flash archi- tecture and $m=2$ for a $\times 2$ interpolating architecture. ..... 51
3.10 Percentage improvement in accuracy-bandwidth tradeoff due to $\times 2$ interpo- lation $(\Delta \kappa \%)$ vs. $\frac{R_{1}}{R_{0}}$ : (a) $W_{\text {Lin }}=17$, and (b) $W_{\text {Lin }}=33$ ..... 53
3.11 Lumping full-flash architecture preamplifiers to form an interpolating archi- tecture. ..... 54
3.12 Percentage improvement in accuracy-bandwidth tradeoff due to $\times 4$ interpo- lation $(\Delta \kappa \%)$ vs. $\frac{R_{1}}{R_{0}}$ : (a) $W_{L i n}=17$, and (b) $W_{L i n}=33$. ..... 56
3.13 The effective gain of the preamplifiers in the presence of the averaging net- work normalized to the isolated preamplifier DC gain vs $\frac{R_{0}}{R_{1}}$. ..... 58
3.14 The reduction in the effective gain of preamplifiers due to interpolation as a function of $\frac{R_{1}}{R_{0}}$ for the case of $\times 2$ and $\times 4$ interpolation. ..... 59
4.1 (a) The model of the averaging network and preamplifiers for an infinitearray, (b) equal subcircuit currents flowing in the case of an infinite array,and (c) the model for a finite array.62
4.2 Transfer characteristics of the edge preamplifiers and the interface amplifier. ..... 64
4.3 (a) Terminating the averaging network using an interface amplifier and pre- distorted reference ladder, and (b) terminating the averaging network using an interface amplifier only. ..... 65
4.4 Averaging network termination using dummy preamplifiers ..... 67
4.5 The Termination technique of [22]. (a) Actual circuit. (b) Simplified model. ..... 68
4.6 The proposed termination scheme: (a) actual circuit, and (b) simplified model. ..... 69
4.7 The preamplifier. ..... 70
4.8 The interface amplifier ..... 70
5.1 Analog front end of the ADC ..... 73
5.2 The T/H circuit. ..... 75
5.3 ADC SNDR vs. T/H circuit SDR assuming a 6 -bit ideal quantizer following the T/H circuit ..... 77
5.4 The Sampled signal at the output of the track-and-hold buffer ..... 77
5.5 Spectrum of the sampled signal ..... 78
5.6 First stage preamplifier. ..... 79
5.7 Input referred offset of first stage preamplifier ..... 80
5.8 The interface amplifier ..... 80
5.9 The bias circuit for the preamplifiers and interface amplifier. ..... 81
5.10 The preamplifier of the second, third, and fourth stages. ..... 81
5.11 INL profile obtained from simulation when using the proposed technique. ..... 83
5.12 Digital back end of the ADC. ..... 85
5.13 First stage comparator. ..... 86
5.14 The CMOS latch ..... 88
5.15 Input referred offset of first stage comparator. ..... 89
5.16 Clocked 3-input AND gate to generate ROM address. ..... 91
5.17 The pre-charged ROM ..... 92
5.18 TSPC flip-flop used to hold the output of the ROM. ..... 92
5.19 Timing diagram of the digital back end. ..... 93
6.1 Microphotograph of the chip. ..... 95
6.2 Chip mounted on PCB for testing. ..... 96
6.3 Testing setup. ..... 98
6.4 Measured signal spectrum for an input frequency of 50.04 MHz sampled at 1.6 GS/s. FFT, performed with 8192 samples ..... 99
6.5 Measured signal spectrum for an input frequency of 800.04 MHz sampled at 1.6 GS/s. FFT, performed with 8192 samples. ..... 100
6.6 Measured signal spectrum for an input frequency of 1450.008 MHz sampled at $1.6 \mathrm{GS} / \mathrm{s}$. FFT, performed with 8192 samples. ..... 101
6.7 Measured INL and DNL at 1.6 GS/s. ..... 102
6.8 Measured SNDR and SFDR at 1.6 GS/s ..... 104
6.9 The input signal frequency at 5 ENOB vs. sampling frequency for previously reported 6 -bit flash ADCs and this work ..... 105
6.10 Figure-of-merit $\left(\mathrm{FOM}_{1}\right)$ for previously reported 6 -bit ADCs and this work. ..... 107
6.11 Figure-of-merit $\left(\mathrm{FOM}_{2}\right)$ for previously reported 6 -bit ADCs and this work. ..... 108

## List of Acronyms

| ADC | Analog-to-Digital Converter |
| :--- | :--- |
| CBSC | Comparator-Based Switched-Capacitor |
| DAC | Digital-to-Analog Converter |
| DNL | Differential-Non-Linearity |
| DSP | Digital-Signal-Processing |
| ENOB | Effective Number Of Bits |
| ERBW | Effective resolution BandWidth |
| FET | Field Effect Transistor |
| FOM | Figure-Of-Merit |
| FsOM | Figures-Of-Merit |
| GBW | Gain-BandWidth product |
| INL | Integral-Non-Linearity |
| IOS | Input-Offset-Storage |
| LSB | Least-Significant-Bit |
| MIM | Metal-Insulator-Metal |
| MSB | Most-Significant-Bit |
| OOS | Output-Offset-Storage |
| SAR | Successive-Approximation |
| SFDR | Spurious-Free-Dynamic-Range |
| SNDR | Signal-to-Noise-plus-Distortion ratio |
| T/H | Track-and-Hold |
| TSPC | True-Single-Phase-Clock |
| UWB | Ultra-Wide-Band |

## List of Symbols

$A_{V t} \quad$ Mismatch Coefficient for the FET threshold voltage
$\beta \quad$ FET current factor
$A_{\beta} \quad$ Mismatch Coefficient for the FET current factor
$A_{e f f} \quad$ Effective gain of preamplifier in presence of averaging or interpolating network
$g_{m}^{\prime} \quad$ Effective transconductance of preamplifier in presence of averaging or interpolating network
$W_{\text {Lin }} \quad$ Number of spatial filter (taps) within the linear range of the preamplifier.
$W_{n} \quad$ Width of the offset current stimulus
$W_{T} \quad$ Spatial width of the entire array
$\sigma_{\text {offset }}$ Input referred static offset of preamplifier before averaging
$\sigma_{\text {offset }}^{\prime}$ Input referred static offset of preamplifier after averaging
$\kappa \quad$ The input capacitance-offset spread product
$\Delta \kappa \%$ Percentage reduction in $\kappa$
$\eta \quad$ The loss in voltage headroom due to dummy preamplifiers
$\xi \quad$ Input referred Offset reduction ratio due to averaging
$m \quad$ Interpolation factor
$\Delta \quad$ The least-significant-bit
$\alpha \quad$ Voltage headroom consumed to ensure the proper biasing of the preamplifiers divided by the supply voltage
$\rho \quad$ Number of preamplifiers within the signal range, divided by the total number of preamplifiers

## Chapter 1

## Introduction

The implementation of narrowband wireless communication receivers has undergone a revolutionary change, as many of the receiver functions were moved to the digital domain [1]. This change is driven by the continuous favouring of modern CMOS technologies to digital circuitry, and the well-known robustness of digital circuitry against temperature, supply, and process variations. Also, moving many of the receiver functions to the digital domain and using Digital-Signal-Processing (DSP) techniques allowed the compensation of many of the channel impairments, and made many new applications possible. A key factor to the success of these systems has been the advance of large dynamic range low-power sigmadelta Analog-to-Digital Converters (ADCs) that can convert the information of the analog real-world to digital bits for further processing. On the other hand, the rapidly growing multimedia applications and the ever increasing demand for higher data rates over wireless channels has led to the evolution of new Ultra-Wide-Band (UWB) wireless standards that require moderate resolution ADCs , but a wide bandwidth. For these standards, only the flash ADC architecture provides the required accuracy at the target analog bandwidth [2].

Optical communication systems can also benefit from the capabilities of DSP. It has been suggested [3] [4] that electronic equalization can be employed to compensate for fibre impairments, especially polarization mode dispersion so as to replace the bulky and expensive optical compensation techniques by a digital equalizer at the receiver. This requires the insertion of a high speed ADC in the receiver chain. Since flash ADCs represent the
architecture of choice for applications that require high speed and low to moderate resolution, they have been considered to implement such an ADC. This has resulted in a few recent successful implementation in SiGe BiCMOS technology [5, 6, 7]. However, a low cost CMOS implementation of such designs presents two challenges. The first is to achieve a high sampling speed given the lower gain-bandwidth (lower $f_{t}$ ) of CMOS technology. The second is to handle the large bandwidth of the input signal with a certain accuracy. Although the first problem can be solved by using the time-interleaved architecture $[8,9,10,11]$, the second problem remains as a main obstacle to CMOS implementation. Thus, the feasibility of a CMOS implementation of ADCs for optical communication, or other wireless wide-band applications, depends mainly on achieving a very small input capacitance (large bandwidth) at the desired accuracy.

The full-flash architecture requires $2^{N}-1$ preamplifiers/comparators at its input to resolve N-bits. The static offset of these preamplifiers is what limits the ADC accuracy [12]. However, the preamplifiers' static offset can not be improved without increasing their input capacitance [13]. Thus, the exponential number of required preamplifiers for flash architecture, combined with the input-capacitance-accuracy tradeoff, results in a relatively large input capacitance for the full-flash architecture. This large input capacitance reduces the ADC bandwidth, limiting the highest input signal frequency. One way to alleviate this problem and to increase the operating speed, is to use a resistive averaging network to suppress the preamplifiers offset [14]. This allows smaller area transistors, and therefore, a reduction in the ADC input capacitance at the same accuracy.

The input capacitance of the flash architecture can be also reduced by using interpolation, where the number of ADC input preamplifiers is reduced and a resistor network interpolates the missing zero-crossings needed for N -bit conversion. Since the resistive network also averages out the preamplifiers offsets, an interpolating network works also as an averaging network. However, in [15], it has been claimed that the input referred offset value, and thus, the accuracy of the ADC is determined by the aggregate gate area of the input preamplifiers FETs. Therefore, the usage of interpolation to reduce the input capacitance would lead to an increase in the input referred offset of the ADC, compared
to that of a full flash architecture. This would outweigh the benefit of using interpolation and render the interpolating architecture equivalent to an averaging full flash architecture that trades accuracy for the input capacitance.

The principal drawback of resistive averaging or interpolation is that it shifts the zero-crossing-point of the ADC preamplifiers array from their ideal position. This results in systematic non-linearity especially at ADC transfer characteristics edges. To mitigate this non-linearity, the preamplifier array and the ADC reference-voltage resistor ladder are extended beyond the input signal voltage range using dummy preamplifiers [16, 17, 18]. However, the voltage headroom of the reference ladder represents a main bottleneck to the implementation of flash ADCs in deep-submicron technologies with a reduced supply voltage [19]. Therefore extending the reference ladder beyond the input signal range makes flash ADC less amenable for integration in deep submicron technologies. Moreover, this extension limits the available voltage range for the input signal. In [20] a techniques to reduce the over-range penalty was proposed. However, the reference-voltage ladder still consumes more voltage headroom than that required for the input signal. The triple-cross connection method proposed in [21] eliminates the over-range voltage of the reference ladder. However, this method introduces negative transconductance at the preamplifiers array edges, reducing the effective transconductance, gain, and gain-bandwidth. Also, this method results in a relatively large residual mean Integral-Non-Linearity (INL) value, unless pre-distorted reference voltages are used.

### 1.1 Thesis Contributions

The objective of this thesis is to devise a circuit technique to maximize the bandwidthaccuracy product of flash ADCs, which is crucial for many wide-band wireless and wire-line applications. This is achieved through the following contributions.

- An expression that captures the input-capacitance-accuracy tradeoff of flash ADCs is derived. It is shown that this tradeoff becomes tighter with technology scaling, and hence efficient handling of this tradeoff is essential. Based on that expression, a
rigorous analysis of flash ADC architectures as spatial filters is conducted. The analysis proves that for a given input capacitance, an interpolating architecture achieves a better accuracy, compared to that of an averaging full-flash architecture. Thus, the interpolating architecture can achieve a superior bandwidth-accuracy product, especially in deep submicron technologies that use low power supplies. Circuit level simulations are used to verify the results. Moreover, the gain obtained when using interpolation is formulated and the fundamental reason for the superiority of the interpolating architecture is presented. The limitations of the previous claim, which suggests the equivalence between interpolating architecture and full-flash averaging architecture regarding their input-capacitance-accuracy tradeoff, is also demonstrated (Chapter 3).
- A technique for terminating the averaging network of flash ADC preamplifiers is devised. The proposed technique eliminates the over-range voltage headroom consumed by flash ADC dummy preamplifiers while maintaining the ADC linearity. Hence, a larger value for the ADC Least-Significant-Bit (LSB) is permitted, and the matching requirements of the preamplifiers arrays are relaxed. Thus, a significant reduction in the ADC input capacitance and power dissipation is achieved at the same accuracy. Eliminating the over-range voltage also makes flash ADCs more adapted for integration in deep-submicron technologies. The proposed technique overcomes the shortcomings of the triple cross-connection method (Chapter 4).
- A 6 -bit $1.6-\mathrm{GS} / \mathrm{s}$ flash ADC test chip is designed and fabricated in $0.13-\mu \mathrm{m}$ CMOS technology. The ADC incorporates the proposed termination technique and takes into account the conclusions of the analysis presented. For this chip, the elimination of over-range voltage leads to a $20 \%$ saving in the ADC input capacitance at the same accuracy, and approximately a $33 \%$ reduction in the analog front-end power consumption. The measured Signal-to-Noise-plus-Distortion Ratio (SNDR) is 34.5 dB at $50-\mathrm{MHz}$ and 30 dB at a $1450-\mathrm{MHz}$ input signal frequency. The measured peak Integral-Non-Linearity (INL) and Differential-Non-Linearity (DNL) are 0.42 LSB and
0.49 LSB , respectively. The total power consumed by the ADC is 180 mW , and hence it achieves a figure-of merit of $2.6 \mathrm{pJ} /$ conv (Chapter 5 and Chapter 6).


### 1.2 Thesis Organization

After this introductory chapter, Chapter 2 of this thesis discusses high-speed ADC architectures. This includes the flash architecture and its variants, in addition to the subranging, the two-step, and the pipelined architecture. In Chapter 3, the accuracy-bandwidth tradeoff of flash ADCs is analyzed. Chapter 4 reviews previous resistive averaging termination techniques and introduces the proposed termination technique. The design approach of the test chip using the proposed technique is presented in Chapter 5, along with the complete circuit and block level design details. Chapter 6 introduces the testing setup, reports the measurement results, and compares the chip performance to the state-of-the-art ADCs. Finally, Chapter 7 provides thesis conclusions and suggestions for further research.

## Chapter 2

## High-Speed Analog-to-Digital Converters

Due to the wide range of ADCs applications, many architectures have been used for ADCs implementation. These architectures can be roughly divided into three categories: lowspeed high-resolution, medium-speed medium-resolution, and high-speed low-resolution. In this chapter only high speed ADCs architectures, suitable for the GHz range, are presented, and their performance tradeoffs are highlighted. This chapter starts with a brief review of the main ADCs performance metrics, and a discussion of the comparators, since they represent the core circuit of ADC architectures. Then, high-speed analog-to-digital conversion architectures are presented with emphasis on the different one-step flash architectures. The chapter closes with a discussion of the ADCs figures-of-merit and the effect of technology scaling.

### 2.1 Analog-to-Digital Converters' Performance Metrics

As shown in Fig. 2.1, analog to digital conversion is the process of sampling and quantizing the analog input signal into a discrete signal that can take only a finite set of amplitudes. Therefore, there is a range of valid input voltage values that produce the same output.


Figure 2.1: Analog-to-digital conversion.

This ambiguity generates what is known as quantization noise. The range of the input voltages that generates the same output represents the ADC LSB and is given by

$$
\begin{equation*}
L S B=\frac{V_{F S}}{2^{N}} \tag{2.1}
\end{equation*}
$$

where $V_{F S}$ is the full-scale voltage and $N$ is the number of bits. ADCs, like all other analog-mixed circuits, need some performance metrics to characterize their performance. These performance metrics are described in the following subsections.

### 2.1.1 DC Specifications

## Differential Non-Linearity

In an ideal ADC , the reference transition voltages, where output value changes, are equally spaced by one LSB along the input voltage axis. However, due to circuit non-idealities, these transition voltages deviate from their ideal values, and therefore, the steps sizes between the transition voltages do not equal one LSB in practical ADCs. DNL measures the variation in the step size, compared to the ideal step size. Mathematically,

$$
\begin{equation*}
D N L(j)=\frac{V_{r e f(j)}-V_{r e f(j-1)}}{L S B}-1 \tag{2.2}
\end{equation*}
$$

where $D N L(j)$ is the differential non-linearity at output code $j$, and $V_{r e f(j)}$ is the reference voltage corresponding to code $j$. According to the mathematical definition of the DNL, if
the maximum DNL error is less than one LSB, then the ADC does not have missing codes. It is also worth noting that the DNL directly affects the quantization noise of the ADC.

## Integral Non-Linearity

INL is the deviation of the ADC transfer characteristics from a straight line drawn passing through its end pints. Sometimes another definition for the INL is used, where the INL is measured by comparing the ADC transfer characteristics to the best-fit straight line such that the maximum difference (or the mean squared error) is minimized. The INL is expressed as

$$
\begin{equation*}
I N L(j)=\frac{V_{r e f(j)}-V_{r e f(j)_{\text {Ideal }}}}{L S B} \tag{2.3}
\end{equation*}
$$

where $I N L(j)$ is the integral non-linearity at output code $j, V_{\text {ref }(j)-I d e a l}$ is the ideal reference voltage at code $j$, and $V_{r e f(j)}$ is the actual reference voltage at code $j$. Also, INL can be expressed in terms of the DNL as follows:

$$
\begin{equation*}
I N L(j)=\sum_{i=1}^{j} D N L(i) . \tag{2.4}
\end{equation*}
$$

The INL for the entire ADC is defined as the maximum magnitude of the INL( j ) values. Since the INL determines the curvature of the transfer characteristics, it represents the actual non-linearity of the ADC, and gives an indication of the harmonic distortion introduced by the ADC. A sufficient but not necessary condition of ADC monotonicity is to have the maximum INL deviation less than or equal to 0.5 LSB . Both INL and DNL used to be measured with a slowly increasing ramp input signals. However, a method for measuring the INL and the DNL at the full ADC speed was presented in [22] and adopted afterwards by the industry. The method presented in [22] represents a better way for INL and DNL measurements, since all the high speed effects that can affect the INL and the DNL are considered.

### 2.1.2 Dynamic Specifications

## Signal-to-Noise plus Distortion Ratio

SNDR is the most important specification of the ADC, because it includes errors due to non-linearity, thermal noise, quantization noise, and sampling time jitter. Ideally, this SNDR follows [23]:

$$
\begin{equation*}
S N D R_{d B}=6.02 N+1.76 \tag{2.5}
\end{equation*}
$$

## Effective Number of Bits

Since practical ADCs have an SNDR less than that predicted by (2.5), they do not provide the expected accuracy. The ENOB of a certain ADC gives the number of bits of an ideal ADC that would have the same SNDR as that of the ADC under test. In other words, any N-bit ADC with a certain ENOB is equivalent to another ideal ADC with a number of bits equal to ENOB. By definition, ENOB is given by

$$
\begin{equation*}
E N O B=\frac{S N D R_{d B}-1.76}{6.02} \tag{2.6}
\end{equation*}
$$

## Spurious-Free Dynamic Range

Due to the ADC nonlinearity, when a pure sinusoidal signal is applied to the ADC input, the resulting output spectrum contains the fundamental in addition to its harmonics. The ratio of the fundamental to the largest distortion component defines the Spurious-Free Dynamic Range (SFDR) of the ADC. The SFDR is an important parameter, when the spectral purity of the ADC is of concern. Since the SFDR depends on ADC distortion, it can be related to INL by [24]

$$
\begin{equation*}
S F D R_{d B}=-20 \log \left(|I N L| 2^{-N}+2^{-1.5 N}\right) . \tag{2.7}
\end{equation*}
$$

## The Effective Resolution Bandwidth

In an ideal ADC , the maximum input signal frequency that can be applied to its input without SNDR degradation is equal to half the maximum sampling frequency of the ADC. However, in an actual ADC, the input signal frequency might be less than that due to the finite bandwidth of its building components. The Effective Resolution Bandwidth (ERBW) is used to define the maximum input frequency at which the output signal SNDR drops by 3 dB or $\frac{1}{2}$ LSB. During the determination of the ERBW of a system, a fixed sampling frequency is used.

### 2.2 Comparators

The comparator represents a main building block of all the analog-to-digital architectures, and its performance parameters such as the input referred offset and power dissipation directly affect the whole ADC accuracy and power consumption. In a typical ADC, the operation of the comparator is synchronized with the sampling clock, and the comparison phase is followed by a latch phase to hold the result of the comparison for a half clock cycle to allow acquisition by the next stage.

The input referred offset of a comparator originates from two different sources. The first source is the threshold voltage mismatch and current factor, $\beta$, mismatch of the differential pair used in the comparison. This type of error is referred to as static offset and is given by $[25]^{1}$

$$
\begin{equation*}
\sigma_{\text {static }}^{2}=\frac{1}{W \cdot L}\left[A_{V t}^{2}+\frac{A_{\beta}^{2}}{4} V_{o v}^{2}\right], \tag{2.8}
\end{equation*}
$$

where $A_{V t}$ is is the mismatch coefficient for the FET threshold voltage, $A_{\beta}$ is the mismatch in $\beta, V_{o v}$ is the transistor overdrive voltage, and $W$ and $L$ are the width and length of the preamplifier input differential pair transistors, respectively. The second source is the

[^0]

Figure 2.2: Offset generation mechanisms in a typical comparator.
parasitic capacitance mismatch of the output nodes of the comparator. As a result of this mismatch, different voltage step values would couple to the output nodes at the start of the latch phase, creating offset, as illustrated in Fig. 2.2 [24]. This error is known as the dynamic offset. For a typical high-speed latched comparator, the dynamic offset is much larger than the static one, and dominates the total comparator offset.

At the end of the latch phase, the output nodes of the comparator reache the supply rails. This large output swing results in a kick-back effect to the input nodes of the comparator. It is important to lower this kick-back for the comparators used in ADCs, because glitches resulting from the kick-back effect can disturb the operation of the ADC. In addition to the input offset and the kick-back effect, the metastability performance of the comparator latch is a main performance metric for the comparators used in ADCs. This is discussed later in Subsection 5.2.1.

## Offset Cancellation Techniques

Different techniques have been developed to sample the comparator offset, and subsequently, canceling its effect [26]. In Fig. 2.3(a) the Input-Offset-Storage (IOS) technique is depicted. During $\phi_{1}$, the amplifier is connected in the unity gain mode, and the offset value is stored on the sampling capacitor. Then during $\phi_{2}$, the feedback connection is
disconnected and the offset is added to the input voltage, eliminating the offset effect. It is important to note that only the static offset of the amplifier is sampled, whereas the dynamic offset of the latch remains. The residual input referred offset for IOS is [2] [27]

$$
\begin{equation*}
V_{\text {res }}=\frac{\left(C_{s}+C_{p}\right)}{C_{s}} \frac{V_{\text {offset }}}{1+A}+\frac{\Delta Q}{C_{s}}+\frac{V_{\text {latch }}}{A} \tag{2.9}
\end{equation*}
$$

where A is the gain of the amplifier, $V_{o f f s e t}$ is the input referred offset of the amplifier, $V_{\text {latch }}$ is the latch offset, $\Delta Q$ is the charge injection due to feedback switch, $C_{s}$ is the sampling capacitor, and $C_{p}$ is the parasitic capacitance at the input of the the amplifier. Rather than using a sampling capacitor at the input of the amplifier, the offset can be sampled at the amplifier output as shown in Fig. 2.3 (b). In the Output-Offset-Storage technique (OOS) the input is shorted and the amplified offset is stored so that its effect is canceled during comparison. The residual offset for OOS is expressed as

$$
\begin{equation*}
V_{\text {res }}=\frac{\Delta Q}{A C_{s}}+\frac{V_{\text {latch }}}{A} \tag{2.10}
\end{equation*}
$$

Although the comparison of (2.9) to (2.10) shows that the OOS leads to a less residual offset, this is not necessarily the case. The reason is that in the OOS, the amplifier is operated in an open-loop. Therefore, the comparator is implemented with small gain $A$ to guarantee operation in the active region under the process variations. For the IOS, a much larger gain A can be used. Nonetheless, this results in slower operation, if a large gain single stage is used. In practical designs, cascaded stages that exploit IOS and OOS are used (Fig. 2.3(c)). This allows the implementation of a large overall gain to suppress the latch dynamic offset with a less speed penalty. Flash ADCs that incorporates IOS and OOS have been reported in [2] [28].

### 2.3 The Flash Analog-to-Digital Converters

Typically, high speed ADCs are used in radar applications, hard disk drive read channels, and $\mathrm{Gb} / \mathrm{s}$ communication systems. In these applications, a resolution of only 6 to 8 bits is required. Therefore, high-speed low-resolution architectures can be used for these ADCs. The full-flash requires $2^{N-1}$ comparators to achieve a N -bit resolution, and therefore it

(a)

(b)

(c)

Figure 2.3: Offset cancellation techniques:(a) input offset storage, (b) output offset storage, and (C) multistage offset cancelation.
is prone to a large input capacitance. Therefore, variants of the full-flash architecture have been used to mitigate these drawbacks. In the following subsections, the full-flash architecture and its variations are presented, along with the tradeoffs involved in their design.

## The Full-Flash Architecture

A block diagram of the full-flash architecture is shown in Fig. 2.4. The input signal, $\mathrm{V}_{i n}$, is compared to multiple reference voltages, generated by a reference ladder. Any comparator connected to a resistor string node, where $\mathrm{V}_{\text {ref(j) }}$ is less than $\mathrm{V}_{\text {in }}$, produces a ' 1 ', whereas those connected to nodes with $\mathrm{V}_{r e f(j)}$, greater than $\mathrm{V}_{i n}$, produces '0'. This output code is commonly referred to as a thermometer code. The thermometer code is fed to an encoder that generates the binary digital output. For an N-bit ADC, the number of comparators and preamplifiers is $2^{N-1}$. The large number of preamplifiers, connected to the input, results in a large input capacitance, reducing the bandwidth of the ADC.

The flash ADC, in principle, does not need a sample-and-hold circuit at its input. However, using a sample-and-hold circuit greatly improves the dynamic behaviour and reduces errors due to [24]:

- skew in clock delivery to a large number of comparators
- limited input bandwidth prior to latch regeneration
- signal dependent dynamic nonlinearity

As shown in Fig. 2.4, the preamplifiers' inputs are connected to both the input signal and the reference ladder. Hence, the preamplifier input transistor gate-source parasitic capacitance, $C_{g s}$, couples the input to the reference ladder. Since the reference ladder is responsible for producing the equidistant reference voltage, the coupling must be minimized. For properly decoupled $V_{\text {reft }}$ and $V_{\text {ref- }}$, the maximum feedthrough occurs at the mid node of the reference ladder. The value of the feedthrough depends on the value of the resistor string resistance according to [29]


Figure 2.4: The full-flash architecture.

$$
\begin{equation*}
\frac{v_{\text {feedthrough }}}{v_{\text {in }}}=\frac{\pi}{4} f_{\text {sig }} R C, \tag{2.11}
\end{equation*}
$$

where R is the total resistor ladder resistance, C is the total coupling capacitance of the input preamplifiers, and $f_{s i g}$ is the input signal frequency. Hence, to reduce feedthrough, R needs to be as small as possible. However, this results in an increase in the power dissipation. In practical implementations, the reference ladder resistor values are in the range of tens of Ohms. Therefore, the reference ladder is usually implemented using a metal layer [30].

At high-speed operation, comparators used for comparing the input voltage to reference voltages suffer from a relatively large dynamic offset due to the capacitive coupling between their outputs and the clock signal. This dynamic offset leads to a large input referred offset. To mitigate this input referred offset, preamplifiers are used at the inputs of comparators. Although the large input referred dynamic offset of the comparator is reduced by the insertion of the preamplifier, the static offset of the preamplifier remains. This, nevertheless, yields an improvement, since the comparator dynamic offsets are much higher than the preamplifiers static offset.

The input referred offset voltage degrades the performance of the ADC substantially, since it appears as an additional voltage, added to the resistor string reference voltages. Since both DNL and INL are functions of the reference voltages, the input referred offset degrades both the DNL and INL leading to an increase in the quantization noise and the ADC distortion. The total RMS input referred offset of the preamplifier/comparator structure can be described by

$$
\begin{equation*}
\sigma_{o s}^{2}=\sigma_{p a}^{2}+\frac{\sigma_{c o m p}^{2}}{A_{p a}^{2}}, \tag{2.12}
\end{equation*}
$$

where $\sigma_{o s}$ is the resultant RMS input referred offset voltage, $\sigma_{p a}$ is the preamplifier RMS offset voltage, $\sigma_{\text {comp }}$ is the comparator RMS offset and $A_{p a}$ is the preamplifier gain. The static offset of the preamplifiers arises from the finite matching between the transistors of the differential structures and is given by (2.8). An efficient way to reduce the preamplifiers offset is to use averaging as discussed in the next subsection. Practically, flash ADCs preamplification stage is designed to provide enough gain to almost eliminate the comparators contribution to the input referred offset, and hence static offset of preamplifiers dominates the total input referred offset [12] [2].

## Averaging

The technique of offset reduction using averaging was first introduced by Kattmann and Barrow [14]. Accordingly, resistors were added between the preamplifiers outputs as depicted in Fig. 2.5 to average out error sources and, hence reduce the effect of offset.

Although averaging reduces the input referred offset, averaging degrades the INL at the edges of the resistor ladder, and thus, requires over-range amplifiers to maintain linearity at the edges of the conversion range. The effect of averaging on offset reduction, along with the edge effect, are discussed in details in Chapter 3.


Figure 2.5: Averaging the output of preamplifiers.

### 2.3.1 The Interpolating Flash Architecture

Interpolation is used to reduce the input capacitance of flash ADCs. Fig. 2.6 shows an interpolating architecture with a $\times 4$ interpolation factor. The number of input preamplifiers is significantly reduced by a factor equal to the interpolation factor. To restore the number of zero-crossings needed for the N -bit conversion, a resistor voltage divider interpolates the
outputs of the preamplifiers to generate the missing zero-crossing.


Figure 2.6: The interpolating architecture. Interpolation factor $=4$.

Interpolation sets a lower limit on the linearity of the preamplifiers. Correct zero crossing interpolation requires the linear range of the preamplifiers to extend to the zero-crossing of the transfer characteristics of the adjacent preamplifier. This is illustrated in Fig. 2.7.

The main drawback of interpolation, is the different delays experienced by signals traveling to different comparators. This is due to the different impedances, seen by the com-


Figure 2.7: Preamplifiers' transfer characteristics and interpolated transfer characteristics for two cases: (a)preamplifier's linear range does not extend to the zero-crossing of the neighbouring preamplifier, and (b)preamplifier's linear range extends to the zero-crossing neighbouring of the other pre-amplifier.
parators, looking back into the resistive string. This delay variation becomes effective when a large interpolation ration is used or at high sampling speeds. Different ways to equalize this delay have been suggested in [31] and [32].

### 2.3.2 Capacitive Interpolation and Capacitive Generation of Reference Voltages

In the capacitive interpolation technique, capacitors, rather than resistors, are applied to interpolate the required zero-crossing for the flash ADC , as shown in Fig. 2.8. It is typically used in conjunction with IOS comparators such that the same capacitors that samples the offset are also used to create the potential divider needed for interpolation. In [33, 34, 35], capacitive interpolation technique, combined with a resistive reference voltage ladder was utilized to implement subranging $\mathrm{ADCs}^{2}$. In [2], capacitive interpolation is used to build a 1.2 GS/s flash ADC. Also, the resistive reference ladder is replaced by a capacitor voltage divider ( $C_{1}$ and $C_{2}$ in Fig. 2.8) to save the power consumed by the resistive ladder. The incorporation of capacitive interpolation in the flash architecture has the advantage of eliminating the need for over-range amplifiers. However, capacitive interpolation requires non-overlapping multi-phase clocks. This limits the maximum sampling speed that can be achieved, and thus the maximum sampling speed reported in [2] is less than that allowed by the technology for a resistive interpolating architecture.

### 2.3.3 The Folding Architecture

Although interpolation reduces the number of input preamplifiers, the number of comparators remains as $2^{N}-1$. This number of comparators can be reduced through the use of the folding architecture. The basic idea of folding is to fold the input signal into a smaller range, as shown in Fig. 2.9, such that each comparator is used more than once throughout the input range. The number of folds is equal to the folding factor $\left(F_{F}\right)$.

When folding was first introduced, a single folding circuit was used to implement the transfer characteristics in Fig. 2.9, and then the output signal from the folding circuit was applied to the comparators to determine where the input lies within the fold. A separate circuit determines which fold the input lies in. This circuit is responsible for generating the Most-Significant-Bits (MSBs). However, this arrangement had a severe disadvantage, since practically all the folder transfer characteristics exhibit rounding at the edges, limiting the

[^1]

Capacitive voltage reference ladder

Capacitive interpolation

Figure 2.8: Capacitive interpolation.


Figure 2.9: The folder transfer characteristics.
useful range of the the folder. To overcome this problem, the folding architecture was modified. The modified architecture utilizes several folders instead of just one (Fig. 2.10), and only the zero-crossing points of the folder output voltages are utilized. Therefore, the performance is not affected by the rounding at the edges or the non-linearity of the transfer characteristics. The architecture of Fig. 2.10 is called offset parallel folding [12].

A folding circuit can be easily implemented as in Fig. 2.11. It is built of several amplifiers connected in parallel, where half of the amplifiers have a reversed polarity in an alternating fashion. It can be inferred from the circuit, that only one amplifier should be working in its linear region for any input so as not to lower the folder gain. Therefore, folding imposes limitations on the maximum linear range of the constituting amplifiers. This limits the value of transistors overdrive voltage $\left(V_{G S}-V_{T}\right)$, and can lead to large transistors, and therefore a large input capacitance [36]. Since, for any input signal value to the circuit of Fig. 2.11, three differential pairs would be saturated, one of the load resistors would carry a DC current of $\mathrm{I}_{s s}$, and the other resistor would carry a DC current of $2 . \mathrm{I}_{s s}$. These different currents would shield the real signal current of the fourth differential pair performing reference comparison. Therefore, in practical implementation, an extra differential pair is added, when the number of folding differential pairs is even. This sets the output common mode voltage of the output nodes at the same value.

The folded signal at the output of the folder crosses the comparator threshold $F_{F}$ times,


Figure 2.10: The folding architecture, folding factor $=4$.
when the input is allowed to change through its full range. In other words, the folder output signal frequency is $F_{F}$ times the input frequency. In the absence of a sample and hold circuit, the increased frequency of the folder outputs sets a limit to the maximum input frequency. However, if a sample and hold circuit is used the maximum input frequency is set by the settling time of the folding circuit.

In the folding architecture of Fig. 2.10, the number of folders is $\frac{2^{N}}{F_{F}}$, and each folder is built of a number of amplifiers equal to the folding factor. Hence, the number of preamplifiers at the input of the ADC is the same as those in the case of the full-flash ADC. If it is required to reduce both the number of comparators and preamplifiers, a foldinginterpolating architecture is adopted, as presented in the next subsection.


Figure 2.11: A CMOS folding circuit.

### 2.3.4 The Folding-Interpolating Architecture

The large number of preamplifiers and comparators needed for implementing the full-flash architecture limits its use to a 6 -bit resolution [37]. For high speed applications requiring higher resolution ( 8 bit to 12 bit ), the folding-interpolating architecture is successfully used [29, 31, 38, 39]. Fig. 2.12 presents the folding-interpolating architecture. It consists of an L-bit coarse ADC that determines which fold the input signal lies in, and an M-bit fine ADC that resolves the position of the input signal within the fold. The total ADC resolution is

$$
\begin{equation*}
N=L+M . \tag{2.13}
\end{equation*}
$$

The coarse ADC must have a number of bits that can represent all the folds that is

$$
\begin{equation*}
F_{F}=2^{L} . \tag{2.14}
\end{equation*}
$$

Then, the number of folders is

$$
\begin{equation*}
N_{F B}=\frac{2^{M}}{F_{I T P L}}, \tag{2.15}
\end{equation*}
$$



Figure 2.12: The folding interpolating architecture.
where $F_{I T P L}$ is the interpolation factor. To avoid using large interpolating and folding factors, cascaded stages of interpolation and folding are usually utilized (Fig. 2.13). In short, interpolation and folding are used to reduce the complexity of the flash ADC, and therefore, its power dissipation and area. However, since folders have a relatively large settling delay, they limit high speed operation.


Figure 2.13: The cascaded Folding interpolating architecture.

### 2.3.5 Calibration of Flash ADCs

Although calibration has been widely employed in medium-speed ADCs such as pipelined ADCs to mitigate the error of its building blocks, calibration has not been heavily applied to high-speed flash ADCs. Recently though, with the growing trend to use digital calibration to correct for the errors of analog circuits, many flash ADCs that use calibration have been reported [40, 41, 42]. Contrary to the offset cancelation techniques in Section 2.2 that deal mainly with the static offset, calibration can be used to correct for both the static offset and the latch dynamic offset.

Calibration techniques are categorized as either; foreground or background techniques. In the foreground technique, normal operation is interrupted to start a calibration cycle.

Many video and communication standards define standby time that would allow foreground calibration. Otherwise, foreground calibration can be carried once at power up. However, any temperature or supply variation that occurs during normal operation can render the measured error (and consequently the calibrating signal) during initial calibration invalid ${ }^{3}$. Various comparator foreground calibration techniques are shown in Fig. 2.14. In Fig. 2.14(a), the comparator offset is corrected by applying a digitally controlled current to the input of the differential pair [39]. During calibration, an on-chip calibrating input signal is applied to the input of the ADC, and a control unit determines the calibrating currents values based on the detected ADC digital output. The calibration technique in Fig. 2.14(b) applies calibration currents to the output nodes instead of the input nodes [40], whereas the in Fig. 2.14(c), the output nodes capacitances are changed to correct for the latch dynamic offset [41].

The background calibration techniques do not interrupt the normal operation, and are assigned a periodic time slot of the system clock. In Fig. 2.15, the background calibration technique of [42] is shown. In this method, the input of the preamplifier is shorted during calibration such that the output of the comparator depends solely on its offset. The decision made by the comparator is used by a control unit to adjust the input voltage to an auxiliary differential pair, connected to the comparator outputs. The current produced by the differential pair balances the offset eliminating its effect.

To avoid the the sampling speed reduction due to the calibration time slot, in some implementations, background calibration is used, along with time-interleaving more than one ADC as for the case of the pipelined ADC in $[43,44]$ and the subranging ADC in [42]. Since background calibration works continuously, it is more robust to temperature and supply variation, compared to foreground calibration.

It is noteworthy that optimizing the ADC linearity and dynamic performance prior to introducing calibration is essential to maintain a small realizable dynamic calibration range and to simplify the calibration circuitry [39]. Hence, calibration techniques do not

[^2]
(b)

(c)

Figure 2.14: Foreground calibration of comparators used in flash ADCs: (a) applying current at the input, (b) applying current at the output, and (c) varying output capacitance.


Figure 2.15: Background calibration of comparator offset.
eliminate the need for averaging or preamplification as a means to reduce the input referred offset.

### 2.4 The Two-Step Analog-to-Digital Converters

The two-step architecture breaks the exponential dependence of the area, power, and input capacitance of the flash ADC on the target resolution. The block diagram of the two-step ADC architecture is illustrated in Fig. 2.16 for a target resolution of $\mathrm{L}+\mathrm{M}$ bits. The number of required comparators is $2^{L}+2^{M}-2$, compared to $2^{L+M}-1$ in the case of the full-flash architecture. In the two-step architecture, the input signal is applied to a coarse ADC that generates an estimate of the input value (the L MSBs). Subsequently, the residue resulting from subtracting the original input and the coarse estimate is applied to a fine ADC that determines the M least-significant-bits. Although the coarse ADC needs to be designed to L bits accuracy only ${ }^{4}$, the sample-and-hold circuit, the subtractor, and the Digital-to-Analog Converter (DAC) must achieve an L+M bits accuracy [23] [27]. The

[^3]

Figure 2.16: The two-step ADC architecture.
fine ADC can be preceded by an amplifier to relax it accuracy requirements. Otherwise, the fine ADC must be designed to the same accuracy level of the whole system as well. Since the two-step architecture is typically used for resolutions above 8 -bits, the resulting speed-accuracy tradeoff for the high accuracy blocks leads to a long settling time. Hence, the reduced hardware and the better power efficiency of the two-step ADC, compared to that of the flash ADC, comes at the expense of lower achievable sampling speed.

### 2.5 The Two-step Subranging Analog-to-Digital Converters

The subranging architecture is a two-step architecture with no explicit subtractor. In this architecture, the full-input signal range is divided into subranges. Based on the input signal value, the Coarse ADC produces the MSBs that determines the subrange, where the signal lies. A DAC uses the generated MSBs to deliver reference voltages within the selected subrange to the fine ADC. A resistor ladder DAC is commonly used to implement the subranging technique, as shown in Fig. 2.17. The DAC and fine ADC should have the same accuracy as that of the whole subranging ADC. This limits the highest attainable sampling speed as for the architecture of Section 2.4. A state-of-the-art sampling speed of $160 \mathrm{MS} / \mathrm{s}$ at a resolution of 10-bits has been reported in 90-nm technology [45].


Figure 2.17: The two-step subranging architecture.

### 2.6 Pipelined Analog-to-Digital Converters

The pipelined architecture extends the two-step architecture to a multistage architecture, where each stage resolves a few bits of the final digital word (Fig. 2.18). The sample-andhold operation preceding stages allows each stage to process a different sample simultaneously. Hence, the throughput of the ADC is limited by that of one stage. However, the cascaded stages result in a large latency ${ }^{5}$. A major advantage over flash architecture is that pipelined ADCs power dissipation and area grows linearly with the number of bits (rather than exponentially, as in the flash architecture). However, the op-amps used to implement the interstage gain suffers form a tight gain-bandwidth tradeoff, limiting the maximum sampling speed. Also, the op-amp consumes a large amount of power to allow high speed operation [37]. Nonetheless, in the approach of Comparator-Based-SwitchedCapacitor Circuits (CBSC) introduced in [46], the op-amp is replaced with a comparator and a current source resulting in a power saving. One year later, an even more power

[^4]

Figure 2.18: Conceptual block diagram of pipelined ADC.
efficient technique have been developed, where a zero-crossing detector replaces the comparator [47].

Typically, pipelined ADCs are used for medium resolutions (8-14bits) high speed applications. At the 14 -bit resolution level, a $224-\mathrm{mW}$ pipelined ADC with a $100-\mathrm{MS} / \mathrm{s}$ sample rate has been reported in the $0.13-\mu \mathrm{m}$ CMOS technology [48], and at 8 -bits and 10 -bits of resolution, sampling speeds as high as $200 \mathrm{MS} / \mathrm{s}$ have been achieved [47] [49]. Also, pipelined ADCs with low resolutions ( $5-6 \mathrm{bits}$ ) and sampling speeds exceeding $500 \mathrm{MS} / \mathrm{s}$ have been demonstrated [50] [51].

### 2.7 Analog-to-Digital Converters Figures of Merit

Figures-Of-Merit (FsOM) are devised such that the key performance metrics of the design are combined to arrive at a single number that can be used for comparing different realizations. The most widely used FsOM for ADCs are

$$
\begin{gather*}
\mathrm{FOM}_{1}=\frac{\text { Power }}{2^{\mathrm{ENOB}_{@ \mathrm{DC}} .2 \cdot \mathrm{ERBW}}},  \tag{2.16}\\
\mathrm{FOM}_{2}=\frac{\text { Power }}{2^{\text {ENOB }_{@ \mathrm{DC}} \cdot f_{s}}}, \tag{2.17}
\end{gather*}
$$

and

$$
\begin{equation*}
\mathrm{FOM}_{3}=\frac{\text { Power }}{2^{\mathrm{ENOB}_{\odot f_{i n} .2 . f_{i n}}}} \tag{2.18}
\end{equation*}
$$

where $f_{s}$ is the sampling speed, and $f_{i n}$ is the input analog signal frequency. Although these FsOM capture the power-speed tradeoff, they do not correctly handle the power-resolution tradeoff, since they assume that increasing the resolution by 1-bit is equivalent to doubling the power dissipation [52]. In thermal-noise limited ADCs ${ }^{6}$, reducing the noise level by 6 dB requires increasing the capacitances level by 4 times $^{7}$. As a result, the transconductance of the amplifiers and power have to be raised with the same factor to maintain the same gain-bandwidth, and hence, the same operating speed. Mismatch-limited ADCs ${ }^{8}$ can require even more power for an extra bit of resolution. For instance, in flash ADCs, one extra bit of resolution needs two times the number of comparators. These comparators need to exhibit double the precision, and therefore their transistors may be sized to 4 times larger resulting in a net increase in the power dissipation of 8 times. Therefore, the FsOM of (2.16) (2.17), and (2.18) should not be used to compare ADCs with different resolutions.

The effect of technology scaling on the power, sampling-speed, and accuracy of flash ADCs has been studied by Uttenhove and Steyaert in [30] and it was shown that these three performance measures can be related as follows:

$$
\begin{equation*}
\frac{\text { Power }}{\text { Speed } \times \text { Accuracy }} \approx \frac{C_{o x}}{A_{V t}^{2}} \text {, } \tag{2.19}
\end{equation*}
$$

where $C_{o x}$ is the gate capacitance per unit area. In (2.19), the gain-bandwidth product is used as a measure for the sampling speed, whereas the accuracy is represented by the

[^5]ratio of the static offset to the LSB. The technology parameter $A_{V t}$ is proportional to the gate oxide thickness, $t_{o x}$, and the gate capacitance per unit area is inversely proportional to $t_{o x}$. Hence, with technology scaling, (2.19) predicts a reduction in $\mathrm{FOM}_{2}$. However, the work in [25] predicted that for technologies with a feature size less than $0.35 \mu \mathrm{~m}$, the improvement in the $\mathrm{FOM}_{2}$ is mitigated with the reduction in the supply voltage. Also, for future nano-technologies, Uttenhove and Steyaert expected that $\beta$ mismatch will dominate the offset, resulting in an increase in the value of $\mathrm{FOM}_{2}[30]$. However, the reported values of $\mathrm{FOM}_{1}$ and $\mathrm{FOM}_{2}$ in the literature indicate continuous improvement in the ADCs FsOM , as shown in Fig. 2.19 and Fig. 2.20.


Figure 2.19: The ADC Figure of Merit $\mathrm{FOM}_{1}$ as a function in technology feature size.


Figure 2.20: The ADC Figure of Merit $\mathrm{FOM}_{2}$ as a function in technology feature size.

## Chapter 3

## Flash ADC Design for a Wide Bandwidth

The objective of this chapter is to compare the full-flash averaging architecture to the interpolating architecture regarding their input capacitance and accuracy. The Chapter presents an analysis that proves that an interpolating architecture has a superior bandwidth-accuracy performance. The results of the analysis are verified with circuit simulations, and the principal reason for the superiority of the interpolating architecture is presented.

### 3.1 The Bandwidth-Accuracy Tradeoff of Flash ADCs

The bandwidth of high speed flash ADCs is mainly determined by the track-and-hold (T/H) circuit at its input [24] [12] [53]. Thus, the input capacitance of the ADC preamplifiers array that loads the $\mathrm{T} / \mathrm{H}$ circuit should be reduced for wide band applications. The input referred static offset RMS value of the flash ADC preamplifiers can be approximated as [13] ${ }^{1}$

$$
\begin{equation*}
\sigma_{o f f s e t} \approx \frac{A_{V t}}{\sqrt{W \cdot L}} . \tag{3.1}
\end{equation*}
$$

[^6]Eq. (3.1) reveals that to reduce the offset of the preamplifiers, the area of the FET transistors needs to be increased. This would increase the input capacitance of the preamplifiers, and hence would result in a reduction of the input signal bandwidth. Therefore, the offsetarea tradeoff leads to a bandwidth-accuracy tradeoff.

The averaging technique [14] can ease this bandwidth-accuracy tradeoff. In this technique, adjacent preamplifiers outputs are connected with resistors. This averages out the random zero-crossing shift due to preamplifier's offset voltage, so that smaller transistors can be used. The amount of offset reduction, $\xi$, obtained by averaging, is defined as

$$
\begin{equation*}
\xi=\frac{\sigma_{o f f s e t}}{\sigma_{o f f s e t}^{\prime}} \tag{3.2}
\end{equation*}
$$

where $\sigma_{o f f s e t}$ and $\sigma_{o f f s e t}^{\prime}$ are the preamplifier input referred offset RMS values before and after averaging, respectively. The value of $\xi$ depends on the ratio of the averaging resistor value, $R_{1}$, to the preamplifier output resistance, $R_{0}$. Decreasing $\frac{R_{1}}{R_{0}}$ leads to a greater reduction in the RMS value of the offset voltage. However, the zero-crossing points of the preamplifiers at the array edges are pulled by the averaging resistors towards the array (inwards). As a result, an increase in the mean values of $\operatorname{INL}(\mathrm{j})$ at the two edges of the array is obtained (Fig. 3.1). Hence, resistive averaging causes a reduction in the random zero-crossings error of the preamplifiers, but produce a systematic zero-crossings error for the edge preamplifiers. In [38], the two ends of the averaging network were cross-connected. Although, cross-connection terminates the averaging resistor network, it does not solve the edge problem [15], because practically the clipped preamplifiers on one end of the array shifts the zero-crossings points of the edge preamplifiers at the other end outwards more than needed to restore the ideal zero-crossings positions. This can result in an INL error that is even higher than that of the case of abrupt termination (no cross-connection), especially for deep sub-micron designs [53].

The conventional solution to the edge problem is to add dummy preamplifiers at the two edges of the preamplifiers array beyond the input signal range, in addition to, crossconnecting the ends of the extended averaging network, as signified in Fig. 3.2 [16, 17, 18]. The number of dummy preamplifiers, needed at each edge of the preamplifiers array of a

(a)

(b)

Figure 3.1: Preamplifiers array edge problem: (a) transfer characteristics of preamplifiers, and (b) INL vs. output code.
full-flash architecture, is approximately equal to half $W_{\text {Lin }}$ that is to be defined later in (3.10). The role of the dummy preamplifiers is to balance the shift in zero-crossing points of the edge preamplifiers (correct the mean values of $\operatorname{INL}(\mathrm{j})$ at the array edges), and to contribute to the offset averaging of the edge preamplifiers such that the RMS value of INL at the array edges is equal to that at the centre. However, the over-range dummies reduce the available voltage headroom for the input signal, as shown in Fig. 3.2. The loss in the voltage headroom due to the dummies is expressed as

$$
\begin{equation*}
\eta=\frac{V_{s i g}}{V_{s i g-\max }}, \tag{3.3}
\end{equation*}
$$

where $V_{\text {sig }}$ is the available voltage headroom for the input signal, and $V_{\text {sig-max }}$ represents the signal voltage headroom plus the over-range voltage. The number of needed dum-


Figure 3.2: Connecting the outputs of preamplifiers to average-out offset.
mies, and consequently, the over-range voltage headroom, depends on $\frac{R_{1}}{R_{0}}$. The higher the amount of offset reduction, the lower the value of $\eta$. In a previous design, the over-range voltage is about $28 \%$ of the $V_{\text {sig }}$ [16].

Based on (3.1)-(3.3), a measure for the accuracy of an ADC that uses averaging is derived as follows:

$$
\begin{equation*}
\left(\frac{\sigma_{o f f s e t}^{\prime}}{\Delta}\right)^{2} \approx \frac{A_{V t}^{2}}{W \cdot L} \frac{2^{2 N}}{V_{D D}^{2}} \frac{1}{\eta^{2} \alpha^{2} \xi^{2}}, \tag{3.4}
\end{equation*}
$$

where $\Delta$ represents the ADC LSB and $\alpha$ accounts for the voltage headroom consumed to ensure the proper biasing of the preamplifier, connected to the lowest reference voltage value, and that of the biasing circuit of the reference ladder, as shown in Fig. 3.3. The input capacitance of such an ADC is given by

$$
\begin{equation*}
C_{i n} \approx \frac{2}{3} C_{o x} W L \frac{1}{\rho m}\left(2^{N}-1\right), \tag{3.5}
\end{equation*}
$$

In (3.5), $m$ represents the interpolation factor and is equal to one if no interpolation is used, and $\rho$ is the number of preamplifiers within the signal range, divided by the total number of


Figure 3.3: Voltage headroom distribution.


Figure 3.4: $\left(\frac{A_{V t}^{2} C_{o x}}{V_{D D}^{2}}\right)$ vs. technology minimum feature size $(L)$.
preamplifiers, including the dummies. Hence, the numerical value of $\rho$ is equal to that of $\eta$.

Eq.(3.4) and (3.5) are combined together, yielding a representation of the bandwidthaccuracy tradeoff of flash ADCs

$$
\begin{equation*}
\kappa \equiv C_{i n}\left(\frac{\sigma_{o f f s e t}^{\prime}}{\Delta}\right)^{2} \approx\left(\frac{A_{V t}^{2} C_{o x}}{V_{D D}^{2}}\right)\left(\frac{2}{3} \frac{2^{2 N}\left(2^{N}-1\right)}{\eta^{2} \rho \alpha^{2}\left(\xi^{2} m\right)}\right) . \tag{3.6}
\end{equation*}
$$

For many applications, minimizing the input capacitance at the same target accuracy (minimizing $\kappa$ ) is a main design objective. An example of such application is flash ADCs designed for wide band communication systems, where a large input bandwidth (and hence, a small $C_{i n}$ ), in addition to a certain accuracy, are dictated by the system specifications. Eq. (3.6) reveals that the higher the value of $\eta$, the lower the input capacitance at the same accuracy. This fact is exploited in Chapter 4 of this thesis to reduce $\kappa$ as the averaging network is terminated without consuming the over-range voltage, and hence $V_{s i g}$ is equal to $V_{\text {sig-max }}$.


Figure 3.5: The highest reported ERBW for 6-bit single channel flash ADCs in different technologies.

The effect of technology scaling on $\kappa$ can be studied by plotting $\left(\frac{A_{V t}^{2} C_{o x}}{V_{D D}^{2}}\right)$ for different CMOS technologies (Fig. 3.4). Typical values for $A_{V t}, t_{o x}$ and $V_{D D}$ can be found in [54]. The remaining parameters in the RHS of (3.6) are technology independent ${ }^{2}$. Fig. 3.4 shows that for technologies with a minimum feature size less than $0.5 \mu \mathrm{~m}, \kappa$ increases with technology scaling. The reason behind this is that the continuous reduction in the supply voltage of technologies beyond $0.5 \mu \mathrm{~m}$ almost cancels out the expected improvement due to the enhancement of the matching properties of the FET transistor as both $A_{V t}$ and $V_{D D}$ scale nearly with the same factor. Since the accuracy depends on the ratio of the offset to the LSB, approximately the same FET area is needed in different submicron technologies to

[^7]achieve a certain accuracy. Thus, the increase in the value of $C_{o x}$, with scaling, would lead to a net increase in $\kappa$. Hence, as technology scales, a tighter bandwidth-accuracy tradeoff is obtained, and optimizing the technology-independent terms in (3.6) becomes crucial. The fact that technology scaling does not favour a higher bandwidth-accuracy product might also may explain why the ERBWs for the reported ADCs have not increased significantly in submicron technologies below $0.35-\mu m$ technology as shown in Fig. 3.5, even though different architecture enhancements have been devised. The simple analysis that led to (3.6) corrects the misconception in [12] that improved matching properties of future technologies would allow a larger bandwidth.

Pan and Abidi predicted that the spread in the offset is defined by the total aggregate gate area of the preamplifiers array [15]. Therefore, an interpolating architecture and a full-flash architecture with the same aggregate gate area (and hence same $C_{i n}$ ) would yield the same offset spread. That is they would have the same $\kappa$. Hence, according to this claim, the expected drop in $\kappa$ due to using interpolation $(m>1)$ is counterbalanced by a drop in the $\xi^{2}$ value. After discussing time-interleaved flash ADCs in the next subsection, we show in Section 3.2 that the interpolating architecture can increase the $\left(m \xi^{2}\right)$ product, reducing $\kappa$ for the practical values of $\frac{R_{1}}{R_{0}}$ in deep-submicron technologies.

### 3.1.1 Time-Interleaving of Flash ADCs

Time-interleaving technique increases the effective sampling rate of an ADC system by a factor of K by operating K low-speed ADCs in a parallel fashion. Relaxing the sampling speed of individual ADCs results in considerable reduction in their power dissipation, and yields an overall power efficient ADC system. In a time-interleaved architecture, the ADCs are preceded by T/H circuits [24] [23]. For low-resolution high speed applications, these $\mathrm{T} / \mathrm{H}$ circuits are implemented as a pass gate transistor switch that samples the signal and holds it on a capacitor [21]. The time-interleaved ADCs represent the load to the T/H circuits.

At any instant, only half of the K interleaved ADCs are in the track mode ${ }^{3}$, and hence

[^8]their track-and-hold circuits sampling switches are ON and K/2 ADC load the input signal [24]. Neglecting the parasitic capacitance of the other sampling switches in the OFF mode, the total input capacitance, seen by the input signal, is $\mathrm{K} / 2$ times larger than the input capacitance of an individual ADC [55]. The pole, caused by the total input capacitance and the terminated line impedance, in addition to the pole of the $\mathrm{T} / \mathrm{H}$ circuit, would limit the whole system bandwidth [10]. Hence, the bandwidth of the resulting interleaved ADC would be less than the one that could have been obtained by one of its ADCs. High sampling speed interleaved flash ADCs have been reported but with ERBW much less than the Nyquist frequency [10]. A special case, when K equals 2. Hence, only one ADC loads the input signal at a time and the bandwidth is preserved [21].

Thus, it can be inferred that the time-interleaved architecture inherently suffers from a large input capacitance and that minimizing $\kappa$ for flash ADCs, designed as a building block for a time-interleaved architecture, is essential to maintain the bandwidth.

### 3.2 Analysis of the Interpolating Architecture

Different expressions describing the expected improvement in INL and DNL, due to averaging, have been derived in $[14,38]$ and [56]. All these expressions are based on simple circuit analysis techniques. Nevertheless, the work of Pan and Abidi in [15] represents a rigorous mathematical analysis for the averaging network of the full-flash architecture as a spatial filter. However, Pan and Abidi use reasoning to predict the equivalence of averaging and interpolation. In this section, we use the same analytical approach to analyze the averaging-interpolating architecture in order to compare the results to those of the full flash case assumed in [15]. It is shown that the prediction of [15] is a special case, at an unpractical value for $\frac{R_{1}}{R_{0}}$, and in general, interpolation can lower the value of $\kappa$.

### 3.2.1 The $\times 2$ Interpolating Architecture

The preamplifiers array and the resistor averaging network of an interpolating architecture with $\times 2$ interpolation ( $m=2$ ) are modeled, as shown in Fig. 3.6(a). The averaging network is considered as a spatial filter, which is subjected to a current stimulus from
the transconductance of the preamplifiers. Two types of inputs are considered: $i_{\text {offset }}[n]$, which represents the random input offset current ${ }^{4}$ due to mismatch, and $i_{\text {sig }}[n]$ which is the signal current due to the ADC input voltage. The current $i_{\text {out }}[n]$ flowing through $R_{0}$ is the filter output. The impulse response of the spatial filter is obtained by applying a unit impulse current (in space domain), $i_{i n}$, at a node $n$ and finding the resulting output currents (Fig. 3.6(b))

$$
\begin{equation*}
i_{\text {in }}=\left(1+\frac{R_{0}}{R_{1}}\right) i_{\text {out }}[n]-\frac{R_{0}}{2 R_{1}}\left(i_{\text {out }}[n+2]+i_{\text {out }}[n-2]\right) . \tag{3.7}
\end{equation*}
$$

From (3.7), a relation between the input current and the output current is formulated in the $Z$-domain. By applying the inverse $Z$ transform, the impulse response is derived as follows (a complete analysis is presented in Appendix A):

$$
\begin{equation*}
h[n]=\frac{h[0]}{2} r^{-|n|}\left(1+(-1)^{n}\right), \tag{3.8}
\end{equation*}
$$

where

$$
\begin{equation*}
r=e^{\frac{1}{2} \cosh h^{-1}\left(1+\frac{R_{1}}{R_{0}}\right)} . \tag{3.9}
\end{equation*}
$$

In [15], it was shown that only the preamplifiers operating in their linear region would introduce stimuli to the filter. This, also, agrees with the work in [38], where it is assumed that only preamplifiers, operating in their linear range of operation, contribute to averaging. Therefore, the stimuli $i_{\text {sig }}[n]$ would take the shape shown in Fig. 3.7. To simplify the analysis, the input stimuli may be assumed rectangular in shape with a width $W_{\text {Lin }}$. Knowing that the linear range of an amplifier is approximately equal to $2 \sqrt{2}\left(V_{G S}-V_{T}\right)$,

$$
\begin{equation*}
W_{L i n}=\frac{2 \sqrt{2}\left(V_{G S}-V_{T}\right)}{\Delta}, \tag{3.10}
\end{equation*}
$$

where $W_{\text {Lin }}$ represents the number of filter nodes (taps) within the linear range of the preamplifier. Since the concern in this thesis is the input referred offset voltage, it is pivotal to relate it to the random offset current shown in Fig. 3.6(a). Both $i_{\text {offset }}$ and the input referred offset voltage, $v_{o f f s e t}$, are random quantities. Therefore, they are represented by their RMS values

[^9]
(a)

(b)

Figure 3.6: (a) The array of preamplifiers and averaging network of a $\times 2$ interpolating architecture modeled as a spatial filter. (b) Impulse response calculation.

$$
\begin{equation*}
\sigma_{o f f s e t}=\frac{\sigma i_{o f f s e t}}{g_{m}[0]} \tag{3.11}
\end{equation*}
$$

where $g_{m}[0]$ is the transconductance of the preamplifier at the edge of the thermometer code (node $n=0$ ). The assumption of rectangular current stimulus implies that all the preamplifiers, operating in their linear region, have a transconductance equal to $g_{m}[0]$. Based on (3.11), the reduction in the input referred offset, associated with averaging, is
expressed in terms of the offset current and the preamplifiers transconductance as [15]

$$
\begin{equation*}
\xi=\frac{\sigma_{\text {offset }}}{\sigma_{\text {offset }}^{\prime}}=\frac{g_{m}^{\prime}[0] / g_{m}[0]}{\sigma i_{\text {offset }}^{\prime} / \sigma i_{\text {offset }}}, \tag{3.12}
\end{equation*}
$$

where $g_{m}^{\prime}[0]$ represents the apparent preamplifier transconductance in the presence of the averaging resistors ( $R_{1}$ ), whereas $\sigma i_{\text {offset }}^{\prime}$ is the RMS value of the offset current after averaging.
The impulse response of (3.8) is convolved with the assumed rectangular current stimulus to yield

$$
\begin{equation*}
g_{m}^{\prime}[0]=\sum_{-\infty}^{\infty} g_{m}[n] h[0-n]=g_{m}[0] \sum_{-\frac{W_{L i n}-1}{2}}^{\frac{\left(W_{L i n}-1\right)}{2}} h[n] . \tag{3.13}
\end{equation*}
$$

Similarly, convolution can be used to express the resulting offset current in terms of the input offset current as

$$
\begin{equation*}
\sigma i_{o f f s e t}^{\prime}=\sqrt{i_{o f f s e t}^{\prime 2-}}=\sigma i_{o f f s e t} \sqrt{\sum_{-\frac{\left(W_{n}-1\right)}{2}}^{\frac{\left(W_{n}-1\right)}{2}} h^{2}[n]}, \tag{3.14}
\end{equation*}
$$

where $W_{n}$ is the width of the offset current stimulus. $W_{n}$ is equal to $W_{L i n}$, if the offset currents, due to the saturated preamplifiers tail current, are small enough, compared to that of the differential pair and can be neglected. Otherwise, if these offset currents have a considerable value and are assumed to be equal to the contribution from the preamplifiers working in their linear range of operation, then $W_{n}$ extends to the whole preamplifiers array.

Utilizing geometric series rules and (3.8) to evaluate the summations of (3.13) and (3.14) and substituting the result in (3.12), $\xi_{\times 2 I T P L}$ for a $\times 2$ averaging-interpolating structure is obtained as follows:

$$
\begin{equation*}
\xi_{\times 2 I T P L}=\frac{\left(1+r^{-2}\left(1-2 r^{-\left(W_{L i n}-1\right) / 2}\right)\right) /\left(1-r^{-2}\right)}{\sqrt{\left(1+r^{-4}\left(1-2 r^{-\left(W_{n}-1\right)}\right)\right) /\left(1-r^{-4}\right)}} . \tag{3.15}
\end{equation*}
$$



Figure 3.7: Current stimulus due to the input signal to the spatial filter formed of the interpolating network.

For the case of a full-flash architecture built with preamplifiers having a linear range equal to $W_{\text {Lin }}$ and whose consecutive preamplifiers outputs are connected by resistors equal to $R_{1}$ [15],

$$
\begin{equation*}
\xi_{F l a s h}=\frac{\left(1+l^{-2}\left(1-2 l^{-\left(W_{L i n}-1\right)}\right)\right) /\left(1-l^{-2}\right)}{\sqrt{\left(1+l^{-4}\left(1-2 l^{-2\left(W_{n}-1\right)}\right)\right) /\left(1-l^{-4}\right)}}, \tag{3.16}
\end{equation*}
$$

where

$$
\begin{equation*}
l=e^{\frac{1}{2} \cosh ^{-1}\left(1+\frac{R_{1}}{2 R_{0}}\right)} . \tag{3.17}
\end{equation*}
$$

Fig. 3.8 provides a comparison of $\xi_{\text {Flash }}$ and $\xi_{\times 2 I T P L}$ for $W_{\text {Lin }}=17$ under the two different assumptions for $W_{n}{ }^{5}$. In the case $W_{n}$ is assumed equal to $W_{\text {Lin }}$, more offset averaging is encountered for the lower values of $\frac{R 1}{R 0}$. As $\frac{R 1}{R 0}$ approaches zero, $\xi_{\text {Flash }}$ tends to $\sqrt{W_{\text {Lin }}}$, and $\xi_{\times 2 I T P L}$ tends to $\sqrt{W_{\text {Lin }} / 2}$. The fact that $\xi$ tends to these value agrees with (3.1),

[^10]because having $\frac{R 1}{R 0}=0$ is equivalent to shorting all the preamplifiers to a single amplifier, and the offset can be directly obtained from (3.1). On the other hand, if the offset stimulus is assumed to extend throughout the preamplifiers array, a value for $\frac{R_{1}}{R_{0}}$ exits below which the reduction in $\frac{g_{m}^{\prime}}{g_{m}}$ ratio is faster than the offset current averaging, leading to a drop in $\xi$. Nonetheless, in both cases, Fig. 3.8 shows that $\xi_{\times 2 I T P L}$ is less than $\xi_{\text {Flash }}$ for any value of $\frac{R_{1}}{R_{0}}$. This is expected, since, for an interpolating architecture, fewer preamplifiers falls within the linear range of each preamplifier. In addition, a larger resistance $2 R_{1}$ than that of the full-flash case connects the outputs of consecutive preamplifiers. It is important to note that the assumption that consecutive amplifiers are connected by a resistor of value $2 R_{1}$ for the case of the $\times 2$ interpolating architecture, and $R_{1}$ for the case of the full-flash, keeps the value of $\eta$ in (3.6) almost the same for both architectures. In other words, both the full-flash architecture and the interpolating architecture would consume almost the same over-range voltage, needed for dummy preamplifiers to balance the zero-crossing shift of the edge preamplifiers.

The results of circuit-level simulations, conducted in $0.13-\mu \mathrm{m}$ CMOS technology, are shown in Fig. 3.8. In these simulations, a full-flash preamplifier array is designed such that each preamplifier linear range extends through approximately 17 preamplifiers. The RMS value of the input referred offset is obtained from 400 Monte-Carlo runs for different values of $R_{1}$. The reciprocals of these values are multiplied by the input referred offset of a single isolated preamplifier to yield $\xi_{\text {Flash }}$. Afterwards, half of the preamplifiers are removed to produce a $\times 2$ interpolating architecture ${ }^{6}$ as in Fig. 3.7, and the simulations are repeated to evaluate $\xi_{\times 2 I T P L}$.

The $\left(m \xi^{2}\right)$ product is plotted in Fig. 3.9. Although $\xi^{2}$ is less for the interpolation case, the product $m \xi^{2}$ is higher for an interpolating architecture for any value of $\frac{R_{1}}{R_{0}}$, when $W_{n}$ is taken equal to $W_{\text {Lin }}$ except at $\frac{R_{1}}{R_{0}}=0$. If $W_{n}$ is assumed to be equal to the spatial width of the entire array, $W_{T}$, then there exists a non-zero value for $\frac{R 1}{R 0}$ that renders the interpolating architecture and the full-flash architecture equivalent (point A in Fig. 3.9). Nonetheless, in both cases, this occurs at a very low value for $\frac{R_{1}}{R_{0}}$. In deep-submicron

[^11]

Figure 3.8: Offset reduction ratios $\xi_{F l a s h}$ and $\xi_{\times 2 I T P L}$ vs. $\frac{R_{1}}{R_{0}}$ for $W_{\text {Lin }}=17$.


Figure 3.9: $m \xi_{\text {Flash }}^{2}$ and $m \xi_{\times 2 I T P L}^{2}$ vs. $\frac{R_{1}}{R_{0}}$ for $W_{\text {Lin }}=17 . m=1$ for a full-flash architecture and $m=2$ for a $\times 2$ interpolating architecture.
technologies, the available voltage headroom is limited. Therefore, it is highly undesirable to use a low value for $\frac{R_{1}}{R_{0}}$, because this would result in a larger over-range voltage for the dummy preamplifiers. In addition, reducing $\frac{R_{1}}{R_{0}}$ to a very low value does not improve the accuracy because the resulting large over-range voltage limits the input signal headroom, leading to a smaller LSB. If the LSB value is decreased, the input referred offset becomes more comparable to it. This outweighs the benefit of increased averaging. Therefore, there is an optimum value for $\frac{R_{1}}{R_{0}}$. The full-flash ADC presented in [16] uses an optimum value for $\frac{R_{1}}{R_{0}}$ equal to 0.1.

The percentage reduction in $\kappa$, due to $\times 2$ interpolation $(\Delta \kappa)$, is shown in Fig. 3.10 for different values of $W_{\text {Lin }}$. For $\frac{R_{1}}{R_{0}}=0.1$, a $\times 2$ interpolating architecture provides the same accuracy as that of the full flash but with a $17 \%$ lower input capacitance if $W_{\text {Lin }}=17$. Whereas the reduction in $C_{i n}$ would be more than $25 \%$, if $W_{\text {Lin }}=33$. Note that for a given
accuracy, a $\times 2$ interpolating architecture can not reduce the input capacitance by $50 \%$ as might be expected when the number of preamplifiers is reduced by $1 / 2$. However, a further reduction in $C_{i n}$ (at the same accuracy) can be attained, if a higher interpolation factor architecture is adopted.

The fundamental reason that makes an interpolating architecture achieve a superior speed-accuracy product over the that of full-flash architecture can be explained with the aid of Fig. 3.11. It is assumed that a $\times 4$ interpolating architecture is constructed by lumping together the gate area of every four of the full-flash architecture preamplifiers to a single node. Thus, both structures would have the same input capacitance. While the interaction between the consecutive preamplifiers in the full-flash architecture is limited by the value of $R_{1}$, the interpolating architecture has every 4 of the full-flash preamplifiers connected with a short-circuit, maximizing the offset current cancellation among these preamplifiers. Hence, an interpolating architecture is similar to a full-flash architecture but with a short-circuit connecting some of its preamplifiers. As a result, an interpolating architecture, with the same input capacitance of a full-flash architecture, would have a better accuracy. This also explains why $\Delta \kappa$ in Fig. 3.10 drops as $R_{1}$ tends to zero; that is, at the same input capacitance, the accuracy of the full-flash architecture tends to that of an interpolating architecture as $\mathrm{R}_{1}$ tends to a short circuit. Therefore, it is concluded that for a given input capacitance, a higher accuracy is achieved with the total allowed gate area budget lumped into a smaller number of amplifiers, because this allows the maximum averaging of gate oxide non-idealities that cause the threshold mismatch. While in the case of the gate area is distributed over a large number of preamplifiers, the averaging level drops and becomes limited by the value of the averaging resistor.

It should be noted that the work in [56] has reached a similar conclusion, that is, reducing the number of input preamplifiers leads to a higher accuracy at the same input capacitance. However, the conclusion was reached by using an impractical assumption. As it assumed that all preamplifiers are perfectly linear such that the offset averaging is not limited by the linear range of the amplifiers. As a result, the quantitative results, presented in [56] on which the final conclusion was based, are not valid for actual amplifiers


Figure 3.10: Percentage improvement in accuracy-bandwidth tradeoff due to $\times 2$ interpolation $(\Delta \kappa \%)$ vs. $\frac{R_{1}}{R_{0}}$ : (a) $W_{\text {Lin }}=17$, and (b) $W_{\text {Lin }}=33$.


Figure 3.11: Lumping full-flash architecture preamplifiers to form an interpolating architecture.
with a limited linear range. This may explain why no attempt has been made to verify these quantitative results with circuit level simulations, as carried out in the work of this thesis. Also, the work in [56] ignores fixing the required over-range voltage when comparing different interpolation ratios, which add another major source of error to the quantitiave results. Nonetheless, the final conclusion of [56] remains correct.

### 3.2.2 Architectures with a Higher Interpolation Factor

In the previous subsection, an analysis for a $\times 2$ interpolating architecture has been conducted. The extension of this analysis to the case of $\times m$ is straightforward. In this case, the preamplifiers outputs are connected by $m R_{1}$ resistors and the number of preamplifiers contributing to averaging drops by a factor of $m$ compared to the full flash case. Hence,

$$
\begin{equation*}
\xi_{\times m-I T P L}=\frac{\left(1+\grave{l}^{-2}\left(1-2 \grave{l}^{-\left(W_{L i n}-1\right) / m}\right)\right) /\left(1-\grave{l}^{-2}\right)}{\left.\left.\sqrt{\left(1+\grave{l}^{-4}\left(1-2 \grave{l}-2\left(W_{n}-1\right) / m\right.\right.}\right)\right) /(1-\grave{l}-4)}, \tag{3.18}
\end{equation*}
$$

where

$$
\begin{equation*}
\grave{l}=e^{\frac{1}{2} \cosh ^{-1}\left(1+\frac{m R_{1}}{2 R_{0}}\right)} . \tag{3.19}
\end{equation*}
$$

Fig. 3.12 plots the $\Delta \kappa$ for a $\times 4$ interpolating architecture. As expected, the percentage reduction in $\kappa$ is higher for a $\times 4$ interpolating architecture than a $\times 2$ architecture. At $\frac{R_{1}}{R_{0}}=0.1$, the reduction in the input capacitance is approximately $30 \%$, if $W_{L i n}=17$, and more than $40 \%$ for $W_{\text {Lin }}=33$, at the same accuracy.

In literature, a single stage with an interpolation factor greater than 4 has not been reported (except in [57]) to avoid the drawback of interpolation discussed in Subsection 2.3.1. However, the benefit of applying higher interpolation factors (fewer input preamplifiers, and hence, higher accuracy at same input capacitance) can be partially exploited through cascading as in [56] and [2], where 6-bit ADCs are implemented using an array of nine preamplifiers at the input whose output is interpolated to 65 zero-crossings through a cascade of three consecutive $\times 2$ interpolating stages.

### 3.3 Preamplifiers Effective Gain in an Interpolating Architecture

The role of preamplifiers in flash ADCs is to introduce a certain voltage gain before the comparators to reduce the input referred offset. Therefore, it is necessary to study the effective gain of the preamplifiers in an interpolating network and how this gain compares to the full-flash case. The effective gain of the preamplifiers in a full-flash or an interpolating array is expressed as

$$
\begin{equation*}
A_{e f f}=g_{m}^{\prime} R_{0} \tag{3.20}
\end{equation*}
$$



Figure 3.12: Percentage improvement in accuracy-bandwidth tradeoff due to $\times 4$ interpolation $(\Delta \kappa \%)$ vs. $\frac{R_{1}}{R_{0}}$ : (a) $W_{\text {Lin }}=17$, and (b) $W_{\text {Lin }}=33$.
where $g_{m}^{\prime}$ is given by

$$
\begin{equation*}
g_{m}^{\prime}=g_{m} \sum_{-\frac{W_{L i n}-1}{2}}^{\frac{\left(W_{L i n}-1\right)}{2}} I[n], \tag{3.21}
\end{equation*}
$$

where $g_{m}$ is the transconductance of an isolated preamplifier, and $I[n]$ is the impulse response of the corresponding network. As explained in Subsection 3.2.1, for an interpolating preamplifiers array, fewer preamplifiers falls within the linear range of each preamplifier, and a larger resistance connects consecutive preamplifiers than that of the full-flash architecture. Therefore, they would have different $I[n]$.

In Fig. 3.13, the effective gain of the preamplifiers, normalized to the intrinsic gain of the preamplifier ( $A=g_{m} R_{0}$ ) for a full-flash preamplifiers array, is plotted as a function of $\frac{R_{1}}{R_{0}}$ for different $W_{\text {Lin }}$. The effective gain increases for a higher $W_{\text {Lin }}$, since more preamplifiers contribute to the effective gain. However, the effective gain drops for the lower values of $\frac{R_{1}}{R_{0}}$, because for these values, the impulse response of the resistive network extends beyond the linear range of the preamplifier, and hence, saturated preamplifiers (with zero transconductance) interact with the preamplifier under investigation, lowering its effective gain. The reduction in the effective gain of the preamplifiers with a lower $\frac{R_{1}}{R_{0}}$ represents another reason to avoid using low values of $\frac{R_{1}}{R_{0}}$.

The plot of Fig. 3.13 is followed to arrive at the effective gain of an interpolating architecture. Point A in Fig. 3.13 represents the normalized effective gain of a preamplifier in a full-flash array with 16 preamplifiers falling within its linear range and with output resistance $R_{0}$ equal to 10 times the averaging resistor $R_{1}$. For the case of a $\times 2$ interpolating network only eight preamplifiers would be within the linear range of the amplifier (Arrow 1 ), and the resistance, connecting the consecutive amplifiers would be doubled (Arrow 2). Consequently, the effective gain of the preamplifier would be represented by point B that has a $6 \%$ lower gain than point A .

The reduction in the effective gain of the preamplifiers due to interpolation is plotted in Fig. 3.14 as a function of $\frac{R_{1}}{R_{0}}$ for the case of $\times 2$ and $\times 4$ interpolation assuming a preamplifier with a linear range that extends to 32 preamplifiers when used in a full-flash resistive


Figure 3.13: The effective gain of the preamplifiers in the presence of the averaging network normalized to the isolated preamplifier DC gain vs $\frac{R_{0}}{R_{1}}$.


Figure 3.14: The reduction in the effective gain of preamplifiers due to interpolation as a function of $\frac{R_{1}}{R_{0}}$ for the case of $\times 2$ and $\times 4$ interpolation.
network. According to Fig. 3.14, the reduction in the effective gain, due to interpolation remains below $5 \%$ for the practical values of $\frac{R_{1}}{R_{0}}$ more than 0.1 . Therefore, it can be concluded that the improved accuracy at a given $C_{i n}$, for the interpolating architecture, comes at the expense of a very small reduction in the effective gain of the preamplification stage.

## Chapter 4

## Coping with the Lower Supply Voltages in Deep-Submicron Technologies

In this chapter, the problem of averaging network termination is addressed. After surveying previous termination methods, a new termination technique is proposed that eliminates the over-range voltage consumed by the dummy preamplifiers.

### 4.1 Previous Solutions

The loss of the signal swing due to the over-range preamplifiers makes averaging technique less suitable for low-voltage designs [30] or ultra deep submicron technologies that have a low nominal supply voltage. Thus, different solutions have been presented to reduce this voltage over-range penalty

### 4.1.1 Reducing the Over-Range Voltage Headroom by Altering Averaging Resistors Value

The work in [58] and the expanded version in [56] model the preamplifiers as a controlled voltage source with three components $\left(\left(V_{r e f(j)}-V_{i n}\right) A+V_{\text {offset }}\right)$ and an output resistance
$R_{0}$ as in Fig. 4.1(a). Since the preamplifier is assumed to be linear, the superposition principle applies and one component is considered at a time. To study the problem of averaging network termination, the $A V_{r e f(j)}$ component is considered, and the model of Fig. 4.1(a) reduces to that shown in Fig. 4.1(b). By using the loop current analysis,

$$
\begin{equation*}
A \Delta V_{r e f}=\left(I_{z}-I_{z+1}\right) R_{0}+I_{z} R_{1}+\left(I_{z}-I_{z-1}\right) R_{0} \tag{4.1}
\end{equation*}
$$

For an infinite array, the network is symmetric and all the $I_{z \pm j}$ currents are equal. Hence, the current $I_{z}$ is given as

$$
\begin{equation*}
I_{z}=\frac{A \Delta V_{r e f}}{R_{1}} \tag{4.2}
\end{equation*}
$$

Thus, the problem of averaging termination turns to the problem of forcing the loop current of the preamplifiers at the array edges to the current of (4.2). Fig. 4.1(c) shows the edge of a finite preamplifiers array. The last averaging resistor value is altered to $R_{T}$. The value of $R_{T}$ that sets the loop currents to $I_{z}$ is derived as follows

$$
\begin{equation*}
A \Delta V_{r e f}=\left(I_{1}-I_{2}\right) R_{0}+I_{1} R_{T}+I_{1} R_{0} \tag{4.3}
\end{equation*}
$$

From (4.2),

$$
\begin{equation*}
I_{z} R_{1}=\left(I_{1}-I_{2}\right) R_{0}+I_{1} R_{T}+I_{1} R_{0} \tag{4.4}
\end{equation*}
$$

Therefore, for $I_{1}$ and $I_{2}$ to be equal to $I_{z}, R_{T}$ must equal $R_{1}-R_{0}$. In other words, altering the value of the last averaging resistor (the averaging resistor connecting the dummy preamplifier) to $R_{1}-R_{0}$ terminates the network. It has been shown that if $R_{0}$ is greater than $R_{1}$ but less than $3 R_{1}$, then two dummy preamplifiers at each edge of the preamplifier array would terminate the network. The averaging resistors connecting these dummy preamplifiers would be equal to ( $\frac{3}{2} R_{1}-\frac{1}{2} R_{0}$ ) for the one, directly following the last in-range preamplifier and a short circuit for the second dummy preamplifier. For higher values of $R_{0}$, more dummy preamplifiers are needed. According to this technique of [56], the number of dummy preamplifiers, required at each of the array edges to terminate the averaging network, is

$$
\begin{equation*}
p=\left\lceil\frac{R_{0}}{R_{1}}\right\rceil . \tag{4.5}
\end{equation*}
$$


(a)

(b)

(c)

Figure 4.1: (a) The model of the averaging network and preamplifiers for an infinite array, (b) equal subcircuit currents flowing in the case of an infinite array, and (c) the model for a finite array.

So, altering the value of the averaging resistors at the array edges, reduces the number of dummies needed to correct for the deviation in zero-crossing points of preamplifiers connected to reference voltages within the input signal range .

The chip reported in [58] had 9 preamplifiers at its input in addition to 2 dummy preamplifiers. Therefore the chip consumes $\frac{2}{9}$ (more than $20 \%$ ) over-range voltage, and a supply of 1.95 V is used for the analog part of the ADC in $0.18-\mu \mathrm{m}$ CMOS technology. Thus, the supply voltage is still higher than the technology nominal supply voltage.

### 4.1.2 Over-Range Voltage Headroom Elimination by Triple CrossConnection

An alternative solution to the problem of averaging network termination is to use triple cross-connection technique [21]. The main idea of this work is to balance the currents of the edge preamplifiers (the small black dots in Fig. 4.2) that cause zero-crossing shift with that of an interface amplifier (large black dot) connected to an in-range reference voltage ( $V_{r e f(4)}$ in Fig. 4.2). As shown in Fig. 4.2, the interface amplifier must have a negative transconductance with respect to the regular preamplifiers to cancel out the effect of the edge preamplifiers. Therefore in [21], the outputs of the interface amplifier are cross-connected to the outputs of the regular preamplifiers as denoted in Fig. 4.3(a). Also pre-distorted reference voltages, generated by digitally controlled current sources, have been used to further reduce the residual INL deviation.

This method reduced the peak mean value of the INL from 5 LSB for abrupt termination to 0.3 LSB according to simulations reported in [21]. The triple cross-connection technique does not need an over-range voltage. However, the pre-distorted reference ladder increases the design complexity. In [53], the pre-distorted reference ladder was eliminated and triple cross-connection and an interface amplifier were used (Fig. 4.3(b)). Nevertheless, the resulting peak mean value of INL, obtained by simulation, was dropped from 4.5 LSB, in the case of abrupt termination to 0.5 LSB only. This value is relatively high, since random variations would still add to it to yield the total INL error value. The proposed


Figure 4.2: Transfer characteristics of the edge preamplifiers and the interface amplifier.


Figure 4.3: (a) Terminating the averaging network using an interface amplifier and predistorted reference ladder, and (b) terminating the averaging network using an interface amplifier only.
termination technique in this thesis (Section 4.2) would lead to much smaller residual INL deviation as described in Subsection 5.1.3, where the INL is dropped from 4.13 LSB for abrupt termination to 0.15 LSB.

A major drawback of the triple cross-connection method is that it introduces negative transconductance at the preamplifiers array edges. As a result, the effective transconductance, gain, and gain-bandwidth of the preamplifiers at the array edges are lowered. To alleviate this effect, the zero-reference point, used for interface amplifier in [53], is chosen to be 3 -steps away from the end of the reference ladder. Nonetheless, this limits the amount of residual INL reduction that can be achieved. Also, the interface amplifier needed for the triple cross-connection method, should have a larger linear range than that of the regular amplifier. Therefore, it becomes harder to match the interface amplifier to the regular amplifier and maintain the same performance across the process and supply variations.

The termination technique proposed in this thesis is a modification to that of [58] which was presented in Subsection 4.1.1 . The technique does not consume over-range voltage headroom nor introduce negative transconductance and therefore, providing an efficient solution to the problem of the averaging network termination.

### 4.2 The Proposed Termination Technique

The concept of the proposed termination circuit is introduced in Subsection 4.2.1, and the details of the circuit implementation are presented in Subsection 4.2.2.

### 4.2.1 Concept

In Fig. 4.4, only one edge of the preamplifiers array is shown, and it is assumed that $D$ number of dummy preamplifiers per edge are used to eliminate the edge problem. The number of needed dummy preamplifiers is reduced by altering the value of the averaging resistors at the edge, as suggested in [56] (Fig. 4.5(a)). The output voltage of the dummy


Figure 4.4: Averaging network termination using dummy preamplifiers.
amplifier in Fig. 4.5(a) is described as (Fig. 4.5(b))

$$
\begin{equation*}
V_{\text {dummy }}=\left[g_{m}\left[V_{\text {in }}-\left(V_{\text {ref }\left(2^{N}-1\right)}+d \triangle\right)\right]+I_{x}\right] R_{0}, \tag{4.6}
\end{equation*}
$$

where $V_{r e f\left(2^{N}-1\right)}$ is the maximum in-range reference voltage, $I_{x}$ is the differential current flowing through the termination resistors $R_{T}$, and $d$ is an integer less than $D$. To eliminate the over-range reference voltage $\left(V_{\left.\text {ref( } 2^{N}-1\right)}+d \triangle\right)$, the interface amplifier of Fig. 4.6(a) together with the in-range reference voltage $\left(V_{r e f\left(2^{N}-1\right)}-d \triangle\right)$ are used instead. Therefore, as shown in Fig. 4.6(b), the output voltage of the interface amplifier is given by

$$
\begin{align*}
& V_{\text {interface }}= {\left[g_{m}\left[V_{\text {in }}-V_{\text {ref }\left(2^{N}-1\right)}\right]\right.} \\
&\left.+g_{m}\left[\left(V_{\text {ref }\left(2^{N}-1\right)}-d \triangle\right)-V_{\text {ref }\left(2^{N}-1\right)}\right]+I_{x}\right] R_{0} \\
& V_{\text {interface }}=\left[g_{m}\left[V_{\text {in }}-\left(V_{\text {ref }\left(2^{N}-1\right)}+d \triangle\right)\right]+I_{x}\right] R_{0}, \tag{4.7}
\end{align*}
$$

So, $V_{\text {interface }}$ is equal to $V_{\text {dummy }}$ in (4.6), and an interface amplifier with the connectivity in Fig. 4.6 can replace the structure in Fig. 4.5.

(a)

(b)

Figure 4.5: The Termination technique of [22]. (a) Actual circuit. (b) Simplified model.

Unlike the work in [53] and [21], the output of the interface amplifier, $V_{\text {interface }}$, does not decrease with the increase in $V_{i n}$; that is, the proposed interface amplifier does not add a negative transconductance. The interface amplifier principle of operation is to shift its zero-crossing point by adding a constant voltage to its output.

### 4.2.2 Circuit Level Implementation

Typically, flash ADC preamplifiers are implemented as a simple differential pair with resistive loads (Fig. 4.7) [30] [56]. The interface amplifier, used in this thesis, is shown in Fig. 4.8. It consists of two differential pairs representing the two transconductors of the interface amplifier model of Fig. 4.6(b). Two load current sources are used to supply extra


Figure 4.6: The proposed termination scheme: (a) actual circuit, and (b) simplified model.
bias current needed by the differential pairs, keeping the value of the currents flowing in the resistors $R_{0}$ equal to that of the regular preamplifier of Fig. 4.7.

The interface amplifier should have the same output impedance, gain, Gain-Bandwidth (GBW), and output common mode voltage as those of the array preamplifiers. Therefore, $(W / L)$ of the input differential pair and the tail current sources of the interface amplifier are set equal to those of the preamplifiers. As a result, both the preamplifiers and the interface amplifier would have equal transconductance $\left(g_{m}\right)$. The current sources connected in parallel to the load resistors $R_{0}$ are designed to have a much larger output impedance than the load resistor $R_{0}$. Hence, both circuits would have almost the same output impedance $R_{0}$ and gain $\left(g_{m} R_{0}\right)$. Assuming that the load capacitance is dominated by the input capac-


Figure 4.7: The preamplifier.


Figure 4.8: The interface amplifier.
itance of the next stage, the two circuits would have approximately equal bandwidth. The DC current flowing through $R_{0}$ in both circuits is $I$, so the output common mode voltage of both circuits is equal to $\left(V_{D D}-I R_{0}\right)$.

Typically, the input referred RMS offset value of the interface amplifier of Fig. 4.8 would be $\sqrt{2}$ times the offset value of the regular preamplifier, if they are designed to have the same input capacitance. That is because, for the interface amplifier, two differential pairs contribute to the input referred random offset, or, in other words, the two differential pairs of the interface amplifier are not perfectly matched causing a higher input referred offset. This would degrade the input referred offset at the preamplifiers array edges. To avoid this degradation, in the actual implementation, the interface amplifier differential pairs are sized such that

$$
\begin{equation*}
\frac{\left(W_{\text {interface }}\right)}{\left(L_{\text {interface }}\right)}=\frac{\left(\sqrt{2} W_{\text {preamp }}\right)}{\left(\sqrt{2} L_{\text {preamp }}\right)} . \tag{4.8}
\end{equation*}
$$

Hence, the interface amplifier would have the same RMS offset as the regular preamplifier but double its input capacitance ${ }^{1}$. That is to say, sizing the interface amplifier according to (4.8) would match the interface amplifier two differential pairs to the required accuracy level.

The circuits shown in Fig. 4.7 and Fig. 4.8 assumed single ended input signal for simplicity. In actual implementation, all the circuits are fully differential, as demonstrated in the next chapter.

[^12]
## Chapter 5

## A 6-bit 1.6-GS/s Low Power Broadband Flash ADC Converter in 0.13- $\mu \mathrm{m}$ CMOS Technology

This chapter details the implementation of a 6 -bit 1.6 -GS/s low Power broadband flash ADC converter that incorporates the proposed termination technique in Section 4.2. The ADC is implemented in a $0.13-\mu \mathrm{m}$ 8-metal single-poly CMOS technology.

### 5.1 The Analog Front End

A block diagram of the ADC analog-front end is illustrated in Fig. 5.1. In order to maximize the bandwidth of the flash ADCs preamplification stage, it is commonly realized by employing low gain amplifiers. A cascade of two to four stages of these low-gain amplifiers is usually needed to suppress the relatively large dynamic offset of the comparators [21] [56] [2]. In this design, four preamplification stages are chosen and a $\times 2$ averaging interpolating network connects the consecutive stages. The preamplification stage is designed such that the total input referred offset RMS value is less than $1 / 4 \mathrm{LSB}$, as required for 6 -bits of resolution. The preamplification stage is preceded by an open-loop $\mathrm{T} / \mathrm{H}$ circuit to enhance the dynamic performance of the ADC. The circuit design details of the main building blocks are introduced in the following subsections.


Figure 5.1: Analog front end of the ADC.

### 5.1.1 The Track-and-Hold Circuit

The Track-and-Hold (T/H) circuit consists of an NMOS sampling switch and a $153-\mathrm{fF}$ MIM capacitor (Fig. 5.2). The input voltage range of the flash ADC is set to 0.84 V p-p differential, and the common mode voltage of the input signal is set to 0.27 V to lower the $\mathrm{T} / \mathrm{H}$ sampling switch resistance. A source follower buffer, following the $\mathrm{T} / \mathrm{H}$ circuit, is used to shift the input common mode voltage to 0.88 V , and consumes 10 mA .

For a target resolution of 6 -bits, and assuming a first order sampling circuit, and that the error due to incomplete settling of the sampling circuit is kept less than $10 \%$ of the LSB, then the time constant of the sampling circuit is related to the acquisition time by

$$
\begin{equation*}
\left(1-\frac{\frac{1}{2^{6}}}{10}\right)<\left(1-e^{\frac{T_{a}}{\tau_{\text {track }}}}\right) \tag{5.1}
\end{equation*}
$$

where $\tau_{\text {track }}$ is the time constant of the sampling circuit, and $T_{a}$ is the acquisition time. Therefore, $\tau_{\text {track }}$ has to be greater than 33 ps , for $T_{a}$ equal to $218.75 \mathrm{ps}^{1}$. Detailed analysis of the source follower circuit leads to a second order transfer function [23]. Therefore, the obtained value for $\tau_{\text {track }}$ is used only to provide an initial guess for the required bandwidth for the entire $\mathrm{T} / \mathrm{H}$ circuit. The $\mathrm{T} / \mathrm{H}$ circuit bandwidth, obtained from simulations, is 7 $\mathrm{GHz}^{2}$.

In the circuit of Fig. 5.2, the main sources of distortion are:

- the sampling switch nonlinear resistance
- the signal-dependent charge injection at switch opening
- the signal-dependent sampling instant
- the source follower nonlinearity

[^13]

Figure 5.2: The T/H circuit.

- the signal-dependent input capacitance of the source follower.

To suppress the charge injected when the sampling switch is turned OFF, a dummy switch, driven by the complement of the clock, is connected to the hold capacitor. Simulations are used to size the dummy switch such that the charge injected by the sampling switch is absorbed by the dummy switch [59]. Also, to reduce the source follower nonlinearity, a replica source follower is adopted to bias the N -well of the source follower main PMOS transistor [21]. This configuration improves the linearity of the source follower, because the configuration eliminates the threshold voltage modulation due to the signal dependent source-well voltage without having the output drive the large non linear well-substrate capacitance. The replica source follower transistors are $\frac{1}{20}$ the size of the corresponding transistors of the main source follower ${ }^{3}$. The nonlinearity arising from the source follower input capacitance variation with the input signal is reduced by inserting

[^14]the MIM capacitor. Since the MIM capacitor has a fixed value that does not vary with the input signal, the total input capacitance, seen by the sampling switch, varies with a lower percentage in the presence of the MIM capacitor.

The $\mathrm{T} / \mathrm{H}$ nonlinearity adds to the quantizer noise, deteriorating the overall ADC output SNDR according to ${ }^{4}$

$$
\begin{equation*}
S N D R=\frac{1}{\frac{1}{S D R_{T}}+\frac{1}{S N R_{Q}}}, \tag{5.2}
\end{equation*}
$$

where $S D R_{T}$ is the $\mathrm{T} / \mathrm{H}$ circuit signal-to-distortion ratio and $S N R_{Q}$ is the signal-to-quantization-noise ratio of the quantizer. Eq. (5.2) is plotted in Fig. 5.3 for a 6 -bit quantizer where it can be concluded that a $\mathrm{T} / \mathrm{H}$ circuit with an SDR above 50 dB is required so as not to limit the performance of the 6 -bit quantizer. Therefore, the $\mathrm{T} / \mathrm{H}$ circuit of this work is designed for an SDR higher than 50 dB .

Fig. 5.4 displays the input signal of frequency 706.25 MHz , and the sampled signal at the source follower output node in the time domain when the T/H circuit is clocked at 1.6 GS/s. The spectrum of the sampled signal is shown in Fig. 5.5. The pseudo differential source follower buffer achieves a $3^{\text {rd }}$ Harmonic Distortion (HD3) of -53 dBc . The simulation is carried out with the transistor nominal model, and therefore, it does not take into account the transistors mismatch that would lead to a second order harmonic.

### 5.1.2 The Reference Ladder

The reference voltages of the flash ADC are generated by an $80-\Omega$ serpentine structure of Metal 3. The low resistance of the resistor ladder reduces the reference ladder signal feedthrough problem.

[^15]

Figure 5.3: ADC SNDR vs. T/H circuit SDR assuming a 6 -bit ideal quantizer following the $\mathrm{T} / \mathrm{H}$ circuit.


Figure 5.4: The Sampled signal at the output of the track-and-hold buffer.


Figure 5.5: Spectrum of the sampled signal.

### 5.1.3 Preamplification Stages

## The Preamplifiers

The first stage preamplifiers (PREAMP1) are difference differential amplifiers (Fig. 5.6) that compares the differential input to the generated differential reference voltages. The input common mode range of PREAMP1 extends from 0.6 V to 1.19 V for the nominal transistor process parameters. Simulations indicate that $V_{\text {ref+ }}$ and $V_{\text {ref- }}$ remain within the input common mode range of PREAMP1 for all the process corners.

Monte Carlo simulations are used to estimate the input referred static offset of the preamplifier. In this simulation, the input voltage to the preamplifier is increased gradually in a DC sweep simulation, and the input voltage value at the output zero-crossing is recorded for each Monte Carlo run . Fig. 5.7 shows the resulting histogram. The standard deviation of the input referred offset is 4 mV . If $A_{V t}$ is used to calculate the offset, then


Figure 5.6: First stage preamplifier.

$$
\begin{equation*}
\sigma_{o f f s e t} \approx \frac{A_{V t}}{\sqrt{W \cdot L}}=\sqrt{2} \frac{5.36 \mathrm{mV} \cdot \mu \mathrm{~m}}{\sqrt{18.08 \mu \mathrm{~m} \cdot 0.25 \mu m}}=3.56 \mathrm{mV} \tag{5.3}
\end{equation*}
$$

Thus, the transistor threshold voltage mismatch dominates the total offset voltage for the designed preamplifier.

The fully differential interface amplifier for terminating the averaging network is given in Fig. 5.8. The sizing of the interface amplifier follows the discussion presented in Subsection 4.2.2. The interface amplifier and the whole preamplification stage is biased by the bias circuit of Fig. 5.9. An off-chip resistor controls the bias circuit for testing purposes. Fig. 5.10 shows the preamplifiers of the second, third and fourth stages that are simple differential amplifiers with resistive loads. The current consumed by each of the preamplifiers at each stage is half that of the preceding stage.

## Averaging and Interpolation



Figure 5.7: Input referred offset of first stage preamplifier.


Figure 5.8: The interface amplifier.


Figure 5.9: The bias circuit for the preamplifiers and interface amplifier.


Figure 5.10: The preamplifier of the second, third, and fourth stages.

Table 5.1: Voltage gain, 3-dB bandwidth, input referred offset, and offset reduction factor of each preamplifiers stage.

|  | Voltage gain (A) | $3-\mathrm{dB}$ bandwidth in <br> GHz | Input referred offset <br> RMS value without <br> averaging $(\sigma)$ in <br> mV | Offset reduction <br> ratio due to <br> averaging $(\xi)$ |
| :--- | :---: | :--- | :--- | :--- |
| First stage | 2.7 | 3.7 | 4 | 1.6 |
| Second stage | 3.2 | 6.1 | 6.51 | 1.6 |
| Third stage | 2.9 | 7.7 | 11.92 | 2 |
| Fourth stage | 2.9 | 5.2 | 15.37 | - |

For each stage, the value of the averaging resistors $R_{1}$ is selected to be 1.5 times larger than the value of the preamplifier load resistance $R_{0}$. For the first stage $R_{T}$ is set to $\left(R_{1}-R_{0}\right)$ [56]. Table 5.1 lists the voltage gain, 3-dB bandwidth, input referred offset RMS value, and the offset reduction ratio for each of the four stages. The total input capacitance of the preamplification stage is 380 fF , and the total input referred offset RMS value due to the four preamplifiers stages and the digital back end comparators is equal to

$$
\begin{equation*}
\sigma_{\text {offset-total }}^{\prime}=\sqrt{\frac{\sigma_{1}^{2}}{\xi_{1}^{2}}+\frac{\sigma_{2}^{2}}{\xi_{2}^{2} \cdot A_{1}^{2}}+\frac{\sigma_{3}^{2}}{\xi_{3}^{2} \cdot \prod_{i=1}^{i=2} A_{i}^{2}}+\frac{\sigma_{4}^{2}}{\prod_{i=1}^{i=3} A_{i}^{2}}+\frac{\sigma_{\text {comp }}^{2}}{\prod_{i=1}^{i=4} A_{i}^{2}}}=3.1 \mathrm{mV} \mathrm{rms} \tag{5.4}
\end{equation*}
$$

where $\sigma_{i}$ and $A_{i}$ are the input referred RMS offset and the voltage gain of the $\mathrm{i}^{\text {th }}$ preamplifiers array, respectively, $\sigma_{\text {comp }}$ is the dynamic offset of the comparators, and $\xi_{i}$ is the offset reduction ratio due to offset averaging of the $\mathrm{i}^{\text {th }}$ preamplifiers array. This value for the total input referred offset represents 0.24 LSB.

Simulations results, shown in Fig. 5.11, indicate that the proposed termination technique limits the maximum mean value of INL to 0.15 LSB (compared to 4.13 LSB in the case of abrupt termination). So, the over-range voltage is eliminated without deteriorating the linearity of the ADC. The results in Fig. 5.11 are obtained by recording the input voltage value at the zero crossing of the preamplifiers of the fourth stage, subtracting the ideal values from them, and then normalizing the result with respect to the ADC LSB.

The resulting distortion due to this INL deviation can be estimated from

$$
\begin{equation*}
S F D R=-20 \cdot \log \left(|I N L| 2^{-N}+2^{-1.5 N}\right) \tag{5.5}
\end{equation*}
$$

Therefore, for an ideal signal applied to a 6 -bit quantizer with a 0.15 LSB INL, the SFDR of the output signal is 47.3 dB . Thus, the highest resulting distortion component is 10 dB below the quantization noise power for a 6 -bit quantizer. To check the ADC output SNDR for the case the T/H of Subsection 5.1.1 is used, a quantizer model is built in MATLAB with an INL of 0.15 LSB and the sampled signal plotted in Fig. 5.4 is applied to it. The output SNDR obtained from the model is 37.3 dB . Hence, this INL deviation has a minor effect on the performance.


Figure 5.11: INL profile obtained from simulation when using the proposed technique.

The ratio of the input signal range $(0.84 \mathrm{~V})$ to $V_{D D}$ is $56 \%$, a value higher than that of [21], [53], and [16]. This shows that the available voltage range is fully assigned to the input signal. For this design, the reference voltage ladder needs to extend along the 9 preamplifiers of the first stage, while the technique proposed in [56] requires the reference ladder to extend along 11 preamplifiers (the 9 preamplifiers processing the input signal +2 dummy preamplifiers). Therefore, the increase in $\eta$ due to over-range voltage elimination is $\left(\frac{11}{9}\right)$. The LSB, as well as the target $\sigma_{\text {offset-total }}^{\prime}$, are increased by the same value keeping the ADC accuracy unaltered. It follows from (3.1) and (5.4) that the input capacitance of each of the preamplifiers arrays is reduced by a factor of $\left(\frac{9}{11}\right)^{2}$. Hence, a $33 \%$ reduction in
the input capacitance of the second, third, and fourth preamplifiers arrays is achieved. For the first stage, the interface amplifier has double the input capacitance of the preamplifiers. Hence, the net reduction in input capacitance of the first stage, due to the proposed technique, is $20 \%$ compared with that of the averaging termination technique of [56]. Since each preamplifier array represents the load of the preceding array, lowering the input capacitance of the preamplification stages results in an increase in the GBW of these stages at the same power dissipation ${ }^{5}$. In other words, the target GBW can be obtained with less power. Thus, the proposed technique not only reduces the ADC input capacitance, but also reduces the power consumption of the analog front end by approximately $33 \%$.

### 5.2 The Digital Back End

A block diagram for the digital back end is provided in Fig. 5.12. The output of the last preamplification stage is fed to an array of low kick-back latched comparators [2], followed by two arrays of CMOS latches [60]. Cascading latches provides a power efficient way to reduce the metastability errors [28] [61, 62, 63], as explained later in Subsection 5.2.1.

A 3-input AND gate is used to transform the thermometer code to 1-of-N code that selects one word of a pre-charged Gray ROM encoder and the output digital stream is down-sampled by a factor of 8 to allow acquisition by the logic analyzer. The divide by 8 clock is generated on-chip to avoid sampling uncertainty due to the supply bounce.

The two main source of errors in the digital back end are metastability and the sparkles (bubbles) in the thermometer code. These two sources of errors are addressed in the design of the digital back end as illustrated in the following subsections.

[^16]

Figure 5.12: Digital back end of the ADC.


Figure 5.13: First stage comparator.

### 5.2.1 Comparators and Latches

The latched comparator used in implementing the ADC is shown in Fig. 5.13. When $\overline{C L K}$ is high (reset phase), both output nodes are pulled to ground, resetting the comparator differential output voltage. The reset transistors $\left(M_{r}\right)$ are sized such that over-drive recovery is completed at the SLOW-SLOW process corner. At the same time, the differential pair at the input amplifies the input signal through the diode connected loads. Since the output node DC potential is around $V_{D D} / 2$, the use of a single transistor to short the output nodes during reset is avoided, as, in this case, the body effect leads to a considerable increase in threshold voltage of the reset transistor. This increases the reset time constant. Sizing the transistor up does not reduce the reset time constant, because the reduction in the transistor resistance is accompanied by a rise in the parasitic capacitance at the output node.

During the regeneration phase, the output nodes are released, and the latch crosscoupled transistors amplify the imbalance, created at the comparator output nodes in the preceding phase with a large regenerative gain. The regeneration time constant is a primary design parameter for latched comparators, because it determines the metastability probability according to [24]

$$
\begin{equation*}
P_{E}=\frac{V_{L}}{A V_{i}} e^{-\frac{T_{r e g}}{\tau_{r e g}}}, \tag{5.6}
\end{equation*}
$$

where $V_{L}$ is the output logic level, $A$ is the gain during reset phase, and $V_{i}$ is the input signal voltage. $T_{\text {reg }}$ is the regeneration period and is equal to 218.75 ps for a $1.6 \mathrm{GS} / \mathrm{s}$ clock with rise and fall times occupying $30 \%$ of the clock period. $\tau_{\text {reg }}$ is the regeneration time constant, and is given by

$$
\begin{equation*}
\tau_{\text {reg }}=\frac{r_{\text {out }} C_{\text {out }}}{A_{\text {reg }}-1} \approx \frac{r_{\text {out }} C_{\text {out }}}{A_{\text {reg }}}=\frac{r_{\text {out }} C_{\text {out }}}{G_{m} r_{\text {out }}}=\frac{1}{G_{m} / C_{\text {out }}}, \tag{5.7}
\end{equation*}
$$

where $r_{\text {out }}$ is the output resistance of the comparator, $C_{\text {out }}$ is the capacitance at the output node, and $G_{m}$ is the transconductance of regeneration and is equal to the transconductance of the cross coupled transistor $\left(M_{c}\right)$. If $C_{o u t}$ is dominated by the gate capacitance of transistor $M_{c}$, then

$$
\begin{equation*}
\tau_{r e g} \approx \frac{1}{\mu_{n} V_{o v} / L^{2}} \tag{5.8}
\end{equation*}
$$

where $\mu_{n}$ is the mobility of electrons, $L$ is the transistor channel length, and $V_{o v}$ is the overdrive voltage of transistor $M_{c}$. Eq. (5.7) shows that the GBW of the regeneration sets the value of the regeneration time-constant. To maximize the GBW, the cross-coupled transistors channel length is set to the minimum feature size. The overdrive voltage of transistors $M_{c}$ may be increased to improve the gain bandwidth, but power dissipation increases quadratically with the overdrive voltage. On the other hand when cascading $n$ identical number of latches, the regeneration gain becomes

$$
\begin{equation*}
A_{\text {reg }(n)}=\prod_{j=1}^{n} A_{j} e^{\frac{j T_{\text {reg }}}{T_{r e g}}}=A^{n} e^{\frac{T_{\text {reg }}}{\tau_{r e g} / n}} \tag{5.9}
\end{equation*}
$$

Hence, if $\tau_{\text {reg }}$ is to be reduced by a factor of $n, n$ latches can be cascaded, and the power dissipation is increased by $n$ times (linear increase). Achieving the same regeneration time constant by using a single stage would lead to an approximately $n^{2}$ increase in power. Thus, cascading latches is considered a power-efficient way for achieving a target metastability performance.


Figure 5.14: The CMOS latch

In this work, the latched comparator is followed by two CMOS latch stages (Fig. 5.14). The GBW of the regeneration loop for each CMOS latch is 11 GHz at the SLOW-SLOW process corner, and that of the latched comparator of Fig. 5.13 is 12 GHz . Therefore $P_{E}$ of less than $2 \times 10^{-20}$ is attained.

In addition to the metastability performance, the dynamic offset of the latched comparator is of a prime concern in flash ADC design, since this dynamic offset dictates the gain of the preamplification stage ${ }^{6}$. Fig. 5.15 shows the histogram of the input referred offset, resulting from 400 Monte Carlo transient simulations. In these simulation runs, a slow time domain ramp signal is applied to the input of the comparator, in addition to the clock signal. Then the input voltage that causes the output to cross zero (the offset) is recorded for each run. The estimated input referred offset of the latched comparator is 30 mV rms.

[^17]

Figure 5.15: Input referred offset of first stage comparator.

### 5.2.2 The Digital Logic

The output bits from the last stage latches represents a thermometer code. Ideally, this code should have a group of logic one's followed by a group of zero's. However, practically, sparkles (bubbles) may arise in this code. In general, the main sources of these sparkles are the lack of a front-end sample-and-hold circuit [37], or the propagation delay variation through the preamplifiers due to their limited bandwidth. In addition, having a total input referred offset greater than 0.5 LSB can switch the order of two adjacent thresholds, causing a bubble in the thermometer code [16]. In this work a front end $\mathrm{T} / \mathrm{H}$ circuit is used, also preamplification is attained by using low gain stages to maximize their bandwidth. Hence, the first two sources of errors are eliminated. Since, the standard deviation of the total input referred offset is 0.24 LSB for the designed ADC, there remains a small probability that the offset becomes greater than 0.5 LSB. Thus, a 3 -input NAND gate is used for first order bubble error suppression. The 3 -input AND gate is implemented as shown in Fig. 5.16. It is formed of a clocked 3-input NAND gate, followed by a clocked inverter [30]. When CLk is high, the NAND gate is transparent, and the internal node Z is evaluated. At the same time, the output of the inverter is discharged to zero. This allows the ROM (Fig. 5.17) to pre-charge its output node because all the ROM pull-down devices would be OFF in this case. During the next half clock cycle (CLK low), the output of the clocked inverter goes high, only if node Z evaluated to zero, and one ROM word is selected.

The output of the ROM is held by the True-Single-Phase-Clocked (TSPC) flip-flop of Fig. 5.18 for a whole clock period. This allows down-sampling of the data by the divide-by- 8 clock. A timing diagram, summarizing the operation of the digital back end, is given in Fig. 5.19.

## The Clock Circuitry

The high speed differential sinusoidal clock, applied to the ADC, is shaped to almost a square wave, by using the clock driver formed of a cascade of CMOS inverters. The input clock signal amplitude is 0.4 V and is terminated on-chip with two polysilicon $50 \Omega$ resistors. The width of the resistors is dictated by the RMS current handling capability


Figure 5.16: Clocked 3-input AND gate to generate ROM address.
of polysilicon. The clock driver is sized such that the $C L K$ and $\overline{C L K}$ drive an estimated total capacitive loads of 0.3 pF and 0.25 pF , respectively, at $1.6 \mathrm{GS} / \mathrm{s}$ for all the process corners. In addition, the $\overline{C L K}$ signal is applied to a clock divider formed of three flip flops similar to that of Fig. 5.18. The clock divider provides the needed clock signal for down sampling the output data and is buffered to be able to drive the logic analyzer in state mode operation for testing.


Figure 5.17: The pre-charged ROM.


Figure 5.18: TSPC flip-flop used to hold the output of the ROM.


Figure 5.19: Timing diagram of the digital back end.

## Chapter 6

## Measurements

The designed ADC was fabricated in $0.13-\mu \mathrm{m}$ 8-metal single-poly CMOS technology. A microphotograph of the chip is shown in Fig. 6.2. The analog front end is surrounded by a guard ring, connected to the analog ground to isolate it from the digital noise. The active area of the design occupies an area of $0.42 \mathrm{~mm}^{2}$. On-chip $0.12-\mathrm{nF}$ and $0.4-\mathrm{nF}$ capacitors are added to decouple the analog and digital supplies, respectively. The decoupling capacitors are implemented as thin oxide NFET-in-Nwell MOS capacitors. To reduce the supply bounce even further, nine pads were assigned for the digital supply rails ( $V_{D D}$ and $V_{S S}$ ) and six pads were used for the analog supply rails $\left(V_{D D A}\right.$ and $\left.V_{S S A}\right)$. Thus the series wire bonding inductance is reduced significantly.

### 6.1 Testing Setup

The chip was mounted on an FR-4 PCB and directly wire-bonded to the board for testing (Fig. 6.2). The PCB is coated with a 0.01 mil layer of gold to allow the gold wire bonds to adhere to the board surface with a high reliability. For the best high-speed performance, a 4-layer PCB is chosen where the two outer planes are assigned for routing and the two inner planes are dedicated to the ground and power supply. Each of the inner planes is split to three separate planes for the analog, digital, and input clock sections of the chip. Ferrite beads are used to connect these planes electrically, and to prevent digital and clock


Figure 6.1: Microphotograph of the chip.


Figure 6.2: Chip mounted on PCB for testing.
noise from corrupting the analog signals ${ }^{1}$. The PCB traces widthes of the input analog signal, input clock signal, the output bits, and output clock signal are sized such that a 50 $\Omega$ characteristic impedance is maintained.

Fig. 6.3 depicts the complete setup for testing. The input analog signal to the ADC is generated by a signal generator, and fed to a phase splitter to create a differential signal. Two bias-T's adds the proper common mode voltage to the input signal. Since the signal generator has a limited harmonic distortion of -30 dBc which is less than that needed to

[^18]test a 6-bit ADC, a lowpass ${ }^{2}$ filter is used to attenuate the harmonics of the generated signal. The input signal is supplied to the PCB through SMA connectors and terminated on-chip using $50 \Omega$ poly resistors. A similar signal path is utilized for the high-speed clock signal, but with no filtering. The measured jitter of the clock signal generator is $0.43 \mathrm{ps}_{r m s}$ when running at $1.6 \mathrm{GS} / \mathrm{s}^{3}$. The signal generators for the input signal and clock are phase locked to allow coherent sampling of the input signal. The output bits and the divide-by- 8 clock are driven off-chip by $50 \Omega$ buffers and their DC component is decoupled using biasT's. The output signals are properly terminated with $50 \Omega$-BNC-feedthrough terminators. Finally, a BNC-to-probe-tip adapter connects the output signal to a logic analyzer. The captured bits are then transferred to a PC to be analyzed.

### 6.2 Measurement Results

The dynamic performance of the ADC is evaluated by coherent sampling. This technique eliminates the need for windowing when FFT is performed. The spectrum of the reconstructed output signal from the ADC for $50 \mathrm{MHz}, 800 \mathrm{MHz}$, and 1.45 GHz input signal frequencies are shown in Fig. 6.4, and Fig. 6.5, and Fig. 6.6, respectively. The harmonics of the ADC output signal fundamental folds back to a frequency below half the sampling rate.

$$
\begin{equation*}
\mathrm{FOM}_{1}=\frac{\text { Power }}{2 \mathrm{ENOB}_{@ \mathrm{DC}} .2 . \mathrm{ERBW}}=2.6 \mathrm{pJ} / \mathrm{conv} \tag{6.1}
\end{equation*}
$$

The ADC non-linearity is measured by the histogram method (code density test) [22] at 1.6 GS/s. Fig. 6.7 shows the measured INL and differential non-linearity (DNL) values, along with the simulation results for the mean values. The maximum INL and DNL deviations are found to be 0.42 LSB and 0.49 LSB , respectively. The dynamic performance of

[^19]

Figure 6.3: Testing setup.


Figure 6.4: Measured signal spectrum for an input frequency of 50.04 MHz sampled at 1.6 GS/s. FFT, performed with 8192 samples.


Figure 6.5: Measured signal spectrum for an input frequency of 800.04 MHz sampled at 1.6 GS/s. FFT, performed with 8192 samples.


Figure 6.6: Measured signal spectrum for an input frequency of 1450.008 MHz sampled at 1.6 GS/s. FFT, performed with 8192 samples.
the ADC at 1.6 GS/s is denoted in Fig. 6.8. The ADC achieves an SNDR of 34.5 dB at $50-\mathrm{MHz}$ input signal. The effective resolution bandwidth (ERBW) is equal to 800 MHz . However, the SNDR remains higher than 30 dB , until an input signal frequency of 1450 MHz . An SNDR of 30 dB at an input frequency of 1.45 GHz has not been reported in CMOS before for a single channel ADC that does not use calibration to the best of the author knowledge. The entire ADC operates from a $1.5-\mathrm{V}$ supply. The analog portion of the ADC consumes 81 mA , whereas the digital circuitry consumes 35.8 mA . The total power dissipation of the ADC , including the reference ladder, is 180 mW .

Based on the measurements, the ADC different FsOM can be evaluated as follows:

$$
\begin{equation*}
\mathrm{FOM}_{1}=\frac{\text { Power }}{2^{\mathrm{ENOB}_{@ \mathrm{DC}} \cdot 2 \cdot \mathrm{ERBW}}}=2.6 \mathrm{pJ} / \mathrm{conv} \tag{6.2}
\end{equation*}
$$



Figure 6.7: Measured INL and DNL at 1.6 GS/s.
and

$$
\begin{equation*}
\mathrm{FOM}_{2}=\frac{\text { Power }}{2 \mathrm{ENOB}_{ब \mathrm{DC}} \cdot f_{s}}=2.6 \mathrm{pJ} / \mathrm{conv} \tag{6.3}
\end{equation*}
$$

A summary of the ADC performance is given in Table 6.1.

Fig. 6.9 shows the input signal frequency at 5 effective number of bits (ENOB) versus sampling frequency for previously reported 6 -bit flash ADCs and the ADC of this work. Fig. 6.10 and Fig. 6.11 plot $\mathrm{FOM}_{1}$ and $\mathrm{FOM}_{2}$ of these ADCs. The ADC reported in [53] has a higher sampling speed of 2 GS/s only because it utilizes time-interleaving. Nevertheless, it has a smaller bandwidth and a worse figure-of-merit than that of this work. Compared to the work in [2] that uses the same technology, the designed ADC achieves a wider bandwidth, and a higher sampling speed at nearly the same $\mathrm{FOM}_{2}$. Hence, the designed ADC provides similar power saving to that offered by the capacitive interpola-

Table 6.1: ADC performance summary.

| Analog Input | $0.84 \mathrm{Vp-p}$ Differential |
| :--- | :--- |
| Input Capacitance | 380 fF |
| Resolution | 6 bits |
| INL @ $f_{\text {in }}=50 \mathrm{MHz}$ | $<0.42$ |
| DNL @ $f_{\text {in }}=50 \mathrm{MHz}$ | $<0.49$ |
| ENOB @ DC | 5.44 bits |
| ERBW | 800 MHz |
| SNDR @ $f_{\text {in }}=50 \mathrm{MHz} / 1.45 \mathrm{GHz}$ | $34.5 \mathrm{~dB} / 30 \mathrm{~dB}$ |
| Power Dissipation | 180 mW |
| FOM $/$ FOM |  |
| Conversion Rate | $2.6 \mathrm{pJ} / \mathrm{conv} / 2.6 \mathrm{pJ} / \mathrm{conv}$ |
| Supply Voltage | $1.6 \mathrm{GS} / \mathrm{s}$ |
| Test Chip Area | 1.5 V |
| Technology | $0.42 \mathrm{~mm}{ }^{2}$ |



Figure 6.8: Measured SNDR and SFDR at 1.6 GS/s.
tion and capacitive reference voltage generation of [2], but at the same time, the sampling speed is not limited by the constraint of generating and using two non-overlapping clock phases. Also, the ADC of this work achieves a wider bandwidth than that reported in [56] at a lower $\mathrm{FOM}_{1}$ and $\mathrm{FOM}_{2}$, while operating at the same sampling speed. Therefore, it is concluded that the designed ADC achieves a superior dynamic performance combined with a low power dissipation.

A complete comparison of the ADC of this thesis with previously reported 6 -bit ADCs of flash, pipelined, and successive approximation (SAR) architectures is listed in Table. 6.2. The ADC of [63] which time interleaves 8 SAR ADCs achieves the lowest FOM among all ADCs for two main reasons. First it employs successive approximation architecture which is one of the most power efficient ADC architectures. Second it exploits time-interleaving
which is a power efficient way to increase the effective sampling speed. However, the sampling speed and bandwidth of [63] remains much less than that of other ADCs in the table or that can be attained using $90-\mathrm{nm}$ CMOS technology. Also, it is hard to achieve higher sampling speed by further time-interleaving more SAR ADCs, because clock skew, gain mismatch, and offset mismatch among interleaved ADC would again limit the dynamic performance. Therefore, such technique remains useful to applications that require low power dissipation and low dynamic performance. Although the pipelined ADC of [51] uses a power efficient architecture and time interleaving, its sampling speed, bandwidth and FsOM is inferior to the ADC of this work.


Figure 6.9: The input signal frequency at 5 ENOB vs. sampling frequency for previously reported 6-bit flash ADCs and this work.

Table 6.2: 6-bit ADCs comparison

| Author/Year | Architecture | Technology | Sampling <br> rate <br> (MHz) | ENOB | $\begin{aligned} & \text { ERBW } \\ & (\mathrm{MHz}) \end{aligned}$ | $\begin{aligned} & \text { Power } \\ & (\mathrm{mW}) \end{aligned}$ | $\begin{aligned} & \mathrm{FOM}_{1} / \mathrm{FOM}_{2} \\ & (\mathrm{pJ} / \text { conv }) \end{aligned}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\begin{aligned} & \text { Draxelmayr [63] } \\ & \text { (ISSCC04) } \end{aligned}$ | Interleaved SAR | 90 nm | 600 | 5.35 | 300 | 10 | 0.4/0.4 |
| Shen $(\mathrm{JSSC07})$ | Interleaved pipelined | $0.18 \mu m$ | 800 | 5.35 | 460 | 105 | 2.8/3.22 |
| Jiang [21] <br> (ISSCC03)  | Interleaved flash | $0.18 \mu m$ | 2000 | 5.7 | 550 | 310 | 5.4/3 |
| Choi $[16]$ <br> $(J S S C 01)$  | Flash | $0.35 \mu m$ | 1300 | 5.1 | 650 | 545 | 12.1/12.1 |
| $\begin{array}{ll} \hline \text { Geelen } & {[18]} \\ (\text { ISSCC01 }) \end{array}$ | Flash | $0.35 \mu \mathrm{~m}$ | 900 | 5.65 | 450 | 300 | 6.6/6.6 |
| Koen (JSSC03) | Flash | $0.25 \mu m$ | 1300 | 5.5 | 500 | 600 | 13.2/10.2 |
| $\left.\begin{array}{ll} \hline \text { Scholten } & {[56]} \\ (\mathrm{JSSCC} 02) \end{array}\right]$ | Flash | $0.18 \mu m$ | 1600 | 5.7 | 560 | 340 | 5.8/4 |
| Christoph [2] (JSSC05) | Flash | $0.13 \mu m$ | 1200 | 5.7 | 700 | 160 | 2.2/2.56 |
| This work | Flash | $0.13 \mu m$ | 1600 | 5.44 | 800 | 180 | 2.6/2.6 |



Figure 6.10: Figure-of-merit $\left(\mathrm{FOM}_{1}\right)$ for previously reported 6 -bit ADCs and this work.


Figure 6.11: Figure-of-merit $\left(\mathrm{FOM}_{2}\right)$ for previously reported 6 -bit ADCs and this work.

## Chapter 7

## Conclusions

The world's rising need for multimedia applications and the increasing demand for larger communication bandwidth require the continuous development and expansion of communication receivers. Most contemporary receiver architectures incorporate an ADC to transform received information to the digital format. Therefore, extending the ADCs bandwidth is essential so as not to limit the overall performance. In addition to communication systems, maximizing the ADC bandwidth is a requirement for other applications such as digital oscilloscopes.

Although the flash ADC architecture achieves the highest sampling speed in a given technology, the exponential dependence of its input capacitance on the target resolution, combined with the offset-gate area tradeoff of preamplifiers differential pair, result in a relatively large input capacitance. Moreover, the limited supply voltage offered by future technologies will outweigh the accuracy improvement due to the superior MOS matching properties. Therefore, optimizing the bandwidth-accuracy tradeoff of the flash ADC and minimizing its input capacitance is crucial to advance its state-of-the-art. Averaging and interpolation are effective ways to reduce this input capacitance. However, the over-range voltage, required to terminate the resistor averaging and interpolating networks, reduce the efficiency of these techniques rendering them less adaptable to integration in deep submicron technologies.

In this work, the input capacitance-accuracy tradeoff of the flash architecture is analyzed. It is shown that for a given input capacitance, the lower the number of input preamplifiers, the higher the achieved accuracy. That is because having few number of preamplifiers would allow a larger gate area for each preamplifier, and hence better averaging of gate oxide non-idealities causing mismatch. As a result, a higher matching level is attained. However, if the number of preamplifiers is increased, the gate area assigned to each preamplifier drops, and the averaging become limited by the value of the averaging resistors. Consequently, interpolating the required zero-crossings from a low number of input preamplifiers leads to a higher accuracy. An efficient way to reduce the number of input preamplifiers is to use cascaded interpolation. As a result, the drawbacks of employing a large interpolation ratio to a single preamplifiers stage are avoided.

In addition, this work presents a new termination technique for the averaging and interpolating networks of flash ADCs that cancel out the over-range voltage headroom consumed by the flash ADC reference ladder. The proposed technique is based on using an interface amplifiers that connects to the in-range reference ladder voltage taps, but has a shifted zero-crossing point. With no consumed over-range voltage, a larger value for the ADC LSB is permitted, and the matching requirements of the preamplifiers arrays are relaxed. Therefore, a reduction in the ADC input capacitance and power dissipation is achieved. Also, eliminating the over-range voltage makes flash ADCs more amenable for integration in deep-submicron technologies. Compared to the method of triple crossconnection [53], the technique of this thesis results in a lower mean INL value, and hence better ADC linearity. Also, the newly developed technique considers the RMS value of offset and maintains this value constant across the preamplifiers array.

The performance improvement that can be attained due to the proposed termination technique is demonstrated through the design of a 6 -bit 1.6 -GS/s flash ADC in $0.13-\mu \mathrm{m}$ CMOS technology. The elimination of the over-range voltage results in a $20 \%$ reduction in the input capacitance and about $33 \%$ savings in the power dissipation of the analog-front end. As a result, the reported ADC of this work achieves almost the same $\mathrm{FOM}_{2}$ as the low power design of [2] that uses capacitive interpolation, but the ADC sampling speed is
not limited by the need to operate by using non-overlapping clocks. Therefore, the ADC runs at a $33 \%$ higher sampling frequency with the same technology. Furthermore, the ADC achieves a wide bandwidth of 1.45 GHz with SNDR greater than 30 dB . By using the proposed termination technique, the maximum mean value of INL is maintained at a low value of 0.15 LSB (compared to 4.13 LSB in the case of abrupt termination). Therefore, the over-range voltage is eliminated without deteriorating the linearity of the ADC.

## Future work

There is a growing trend to use calibration to correct for the errors of high speed ADCs. In [40] and [41], foreground calibration is directly applied to reduce the comparators offset and no preamplifiers are used to implement 4-bit flash ADCs . However for higher resolutions, flash ADC would employ a large number of comparators. Therefore, for 6 -bits and higher resolutions (as the case of the work of this thesis), the ADC is preceded by an interpolating analog front end to reduce the number of amplifiers, loading the sample-andhold circuit, and to reduce the comparator input referred offset. In this case, calibration is applied to the preamplifiers [39]. The reported technique in this thesis can be combined with foreground calibration ${ }^{1}$ to further improve the performance.

Since the implemented chip in the work, described in this thesis, achieves a wide bandwidth, a direct extension of this work is to time-interleave a number of the designed ADC to realize a much faster ADC. This needs to be preceded by a study of the optimum number of ADC that can be interleaved before clock skew, gain mismatch, and offset mismatch, among the interleaved ADCs, deteriorate the performance and outweigh the benefit of time-interleaving. However, designing a clock generator and a clock distribution circuit for such an ADC system to drive the ADCs remains a challenging task.

[^20]
## Appendix A

## The Impulse Response of a $\times 2$ Interpolating Network Treated as a Spatial Filter

In this appendix, the impulse response of a $\times 2$ interpolating network, treated as a spatial filter, is derived. The network is shown in Fig. 3.6 (b) with an impulsive input in the space domain. Applying KCL to the filter nodes yields

$$
\begin{equation*}
i_{\text {in }}=i_{\text {out }}(n)+\frac{R_{0}}{2 R_{1}}\left(\left(i_{\text {out }}(n)-i_{\text {out }}(n+2)\right)+\frac{R_{0}}{2 R_{1}}\left(i_{\text {out }}(n)-i_{\text {out }}(n-2)\right) .\right. \tag{A.1}
\end{equation*}
$$

By re-arranging (A.1)

$$
\begin{equation*}
i_{\text {in }}=\left(1+\frac{R_{0}}{R_{1}}\right) i_{\text {out }}(n)-\frac{R_{0}}{2 R_{1}}\left(i_{\text {out }}(n+2)+i_{\text {out }}(n-2)\right) . \tag{A.2}
\end{equation*}
$$

In the Z-domain,

$$
\begin{equation*}
H(Z)=\frac{I_{\text {out }}(Z)}{I_{\text {in }}(Z)}=\frac{1}{\left(\left(1+\frac{R_{0}}{R_{1}}\right)-\frac{R_{0}}{2 R_{1}}\left(Z^{2}+Z^{-2}\right)\right)} \tag{A.3}
\end{equation*}
$$

Let

$$
\begin{equation*}
\lambda=\frac{R_{0}}{R_{1}} \tag{A.4}
\end{equation*}
$$

To expand $H(Z)$ to lower order terms using partial fraction expansion, the poles of $H(Z)$ need to be calculated as follows:

$$
\begin{gather*}
(1+\lambda)-\frac{\lambda}{2}\left(\zeta^{2}+\frac{1}{\zeta^{2}}\right)=0  \tag{A.5}\\
\frac{e^{\ln \left(\zeta^{2}\right)}+e^{-\ln \left(\zeta^{2}\right)}}{2}=\frac{1}{\lambda}(1+\lambda),  \tag{A.6}\\
\cosh \left(\ln \left(\zeta^{2}\right)\right)=\left(1+\frac{1}{\lambda}\right)  \tag{A.7}\\
\ln \left(\zeta^{2}\right)=\left|\cosh ^{-1}\left(1+\frac{1}{\lambda}\right)\right| \tag{A.8}
\end{gather*}
$$

A modulus is added because $\cosh (x)=\cosh (-x)$. From (A.8)

$$
\begin{gather*}
\zeta= \pm e^{\left.\frac{1}{2} \cosh ^{-1}\left(1+\frac{1}{\lambda}\right) \right\rvert\,}  \tag{A.9}\\
\zeta= \pm r^{ \pm 1} \tag{A.10}
\end{gather*}
$$

where

$$
\begin{equation*}
r=e^{\frac{1}{2}\left(\cosh ^{-1}\left(1+\frac{1}{\lambda}\right)\right)} \tag{A.11}
\end{equation*}
$$

Eq.(A.9) gives the four poles of the transfer function in (A.3). Therefore, $H(Z)$ is described as

$$
\begin{equation*}
H(Z)=\frac{A}{Z-r}+\frac{B}{Z+r}+\frac{C}{Z+1 / r}+\frac{D}{Z-1 / r} \tag{A.12}
\end{equation*}
$$

where $\mathrm{A}, \mathrm{B}, \mathrm{C}$, and $\mathrm{D}^{1}$ are constants. To obtain the value of these constants, (A.12) and (A.3) are equated and solved for the four constants. This results in

$$
\begin{align*}
& A=-B=-\frac{r\left(-1+r^{2}\right)}{2\left(1+r^{2}\right)}  \tag{A.13}\\
& C=-D=-\frac{\left(-1+r^{2}\right)}{2 r\left(1+r^{2}\right)}
\end{align*}
$$

[^21]The stability of $H(Z)$ implies that the four terms of $H(Z)$ have a common region of convergence. Therefore, the first two terms must represent a left-sided sequence so that their region of convergence extends inwards, whereas the last two terms must represent a right-sided sequence so that their region of convergence extends outwards.

By using inverse $Z$ transform tables and knowing the region of convergence of each term, the inverse $Z$ transform of $H(Z)$ is obtained

$$
\begin{align*}
& h[n]=-A r^{n-1} u[-(n-1)-1]-B(-r)^{n-1} u[-(n-1)-1] \\
&+C\left(\frac{-1}{r}\right)^{n-1} u[n-1]+D\left(\frac{1}{r}\right)^{n-1} u[n-1] \tag{A.14}
\end{align*}
$$

Since $A=-B=C r^{2}=-D r^{2},(\mathrm{~A} .14)$ is re-written as

$$
\begin{equation*}
h[n]=\frac{-A r^{-|n|}}{r}\left(1+(-1)^{n}\right) . \tag{A.15}
\end{equation*}
$$

Therefore,

$$
\begin{equation*}
h[n]=\frac{h[0]}{2} r^{-|n|}\left(1+(-1)^{n}\right) . \tag{A.16}
\end{equation*}
$$

Eq. (A.16) represents the impulse response of the spatial filter of Fig.3.6 (b).

## Bibliography

[1] A. Abidi, "The Path to the Software-Defined-Radio Receiver," IEEE J. Solid-State Circuits, vol. 42, no. 5, pp. 954-966, May 2007.
[2] C. Sander, M. Clara, A. Hartig, and F. Kuttner, "A 6-bit 1.2-GS/s low-power flashADC in $0.13 \mu \mathrm{~m}$ digital CMOS technology," IEEE J. Solid-State Circuits, vol. 40, no. 7, pp. 1499-1505, July 2005.
[3] K. Azadet, E. F. Haratsch, H. Kim, F. Saibi, J. H. Saunders, M. Shaffer, and L. Song, "Equalization and FEC techniques for optical transceiver," IEEE J. Solid-State Circuits, vol. 37, no. 3, pp. 317-327, Mar. 2003.
[4] J. Lee, A. Leven, J. Weiner, Y. Baeyens, Y. Yang, W.-J. Sung, J. Frackoviak, R. Kopf, and Y.-K. Chen, "A 6-b 12-GSamples/s track-and-hold amplifier in InP DHBT technology," IEEE J. Solid-State Circuits, vol. 38, no. 9, pp. 1533-1539, Sept. 2003.
[5] J. Lee, P. Roux, U.-V. Koc, T. Link, Y. Baeyens, and Y.-K. Chen, "A 5-b 10GSample/s A/D Converter for 10-Gb/s Optical Receivers," IEEE J. Solid-State Circuits, vol. 39, no. 10, pp. 1671-1679, Oct. 2004.
[6] P. Schvan, D. Pollex, S.-C. Wang, C. Falt, and N. Ben-Hamida, "A 22 GS/s 5b ADC in $0.13 \mu \mathrm{~m}$ SiGe BiCMOS," in Proc. ISSCC Dig. Tech. Papers, Feb. 2006, pp. 572-573.
[7] H.-M. Bae, J. B. Ashbrook, J. Park, N. R. Shanbhag, A. C. Singer, and S. Chopra, "An MLSE Receiver for Electronic Dispersion Compensation of OC-192 Fiber Links," IEEE J. Solid-State Circuits, vol. 41, no. 11, pp. 2541-2554, Nov. 2006.
[8] C.-K. K. Yang, V. Stojanovic, S. Modjtahedi, M. Horowitz, and W. Ellersick, "A serial-link transceiver based on 8-GSsample/s A/D and D/A converters in $0.25-\mu m$ CMOS," IEEE J. Solid-State Circuits, vol. 36, no. 11, pp. 1684-1692, Nov. 2001.
[9] K. Poulton, R. Neff, B. Setterberg, B. Wuppermann, T. Kopley, R. Jewett, J. Pernillo, C. Tan, and A. Montijo, "A $20 \mathrm{GS} / \mathrm{s} 8 \mathrm{~b}$ ADC with a 1 MB memory in $0.18 \mu \mathrm{~m}$ CMOS," in Proc. ISSCC Dig. Tech. Papers, Feb. 2003, pp. 318-496.
[10] W. Ellersick, C.-K. K. Yang, M. Horowitz, and W. Dally, "GAD: A 12-GS/s CMOS 4-bit A/D Converter for an Equalized Multi-Level Link," in Proc. of the VLSI Symposium, June 1999, pp. 49-52.
[11] A. Ismail and M. Elmasry, "Analog-to-Digital Conversion for SONET OC-192," in Proc. Intl. Conf. Syst. on Chip, 2004, pp. 41-44.
[12] Y. Chiu, B. Nikolic, and P. Gray, "Scaling of Analog-to-Digital Converters into Ultra-Deep-Submicron CMOS," in Proc. IEEE Custom Integrated Circuits Conf. Dig. Tech. Papers, 2005, pp. $375-382$.
[13] M. J. Pelgrom, H. P. Tuinhout, and M. Vertregt, "Transistor Matching in Analog CMOS applications," IEEE J. Solid-State Circuits, vol. 19, no. 6, pp. 820-827, Dec. 1984.
[14] K. Kattmann and J. Barrow, "A technique for reducing differential non-linearity errors in flash A/D converters," in Proc. ISSCC Dig. Tech. Papers, Feb. 1991, pp. 170-171.
[15] H. Pan and A. A. Abidi, "Spatial filtering in flash A/D converter," IEEE Trans. Circuits Syst. II, vol. 50, no. 8, pp. 424-436, Aug. 2003.
[16] M. Choi and A. A. Abidi, "A 6-b 1.3-Gsample/s A/D converter in $0.35 \mu \mathrm{~m}$ CMOS," IEEE J. Solid-State Circuits, vol. 36, no. 12, pp. 1847-1857, Dec. 2001.
[17] K. Sushihara, H. Kimura, Y. Okamoto, K. Nishimura, and A. Matsuwasa, "A 6b 800MSsample/s CMOS A/D Converter," in Proc. ISSCC Dig. Tech. Papers, Feb. 2000, pp. 428-429.
[18] G. Geelen, "A 6b 1.1GSample/s CMOS A/D Converter," in Proc. ISSCC Dig. Tech. Papers, Feb. 2001, pp. 128-129.
[19] K. Bult, "Analog broadband communication circuits in pure digital deep sub-micron CMOS," in Proc. ISSCC Dig. Tech. Papers, Feb. 1999, pp. 76-77.
[20] P. C. S. Scholtens and M. Vertregt, "A 6-b 1.6 Gsamples/s flash ADC in $0.18 \mu \mathrm{~m}$ CMOS using averaging termination," in Proc. ISSCC Dig. Tech. Papers, Feb. 2002, pp. 168-475.
[21] X. Jiang, Z. Wang, and M. F. Chang, "A 2GS/s 6b ADC in $0.18 \mu \mathrm{~m}$ CMOS," in Proc. ISSCC Dig. Tech. Papers, Feb. 2003, pp. 322-323.
[22] J. Doernberg, H. Lee, and D. Hodges, "Full-Speed testing of A/D Converters," IEEE J. Solid-State Circuits, vol. 19, no. 6, pp. 820-827, Dec. 1984.
[23] D. A. Johns and K. Martin, Analog Integrated circuit design. USA: John Wiley and sons, Inc, 1997.
[24] R. V. de Plassche, CMOS Integarted Analog-to-Digital and Digital-to-Analog Converters. MA, USA: Kluwer Academic Publishers, 2003.
[25] K. Uyttenhove and M. S. Steyaert, "Speed-power-Accuracy tradeoff in high-speed CMOS ADC's," IEEE Trans. Circuits Syst. II, vol. 49, no. 4, pp. 280-287, Apr. 2002.
[26] R. Poujois, B. Baylac, D. Barbier, and J. M. lttel, "Low-Level MOS Transistor Amplifier Using Storage Techniques," in Proc. ISSCC Dig. Tech. Papers, Feb. 1973, pp. 152-153.
[27] B. Razavi and B. Wooley, "Design Techniques for High-Speed, High-Resolution Comparators," IEEE J. Solid-State Circuits, vol. 27, no. 12, pp. 1916-1926, Dec. 1992.
[28] J. Spalding and D. Dalton, "A 200MSample/s 6 b Flash ADC in $0.6 \mu \mathrm{~m}$ CMOS," in Proc. ISSCC Dig. Tech. Papers, Feb. 1996, pp. 320-321.
[29] A. G. W. Venes and R. V. de Plassche, "An $80-\mathrm{MHz}$, $80 \mathrm{~mW}, 8$-b CMOS Folding A/D Conveter with distributed Track-and-Hold Preprocessing," IEEE J. Solid-State Circuits, vol. 31, no. 12, pp. 1846-1853, Dec. 1996.
[30] K. Uyttenhove and M. S. Steyaert, "A 1.8-V 6-Bit 1.3-Hz Flash ADC in $0.25-\mu \mathrm{m}$ CMOS," IEEE J. Solid-State Circuits, vol. 38, no. 7, pp. 1115-1122, July 2003.
[31] P. Vorenkamp and R. Roovers, "A 12-b, 60-MSample/s cascaded folding and interpolating ADC," IEEE J. Solid-State Circuits, vol. 32, no. 12, pp. 1876-1886, Dec. 1997.
[32] R. J. V. de Plassche and P. Baltus, "An 8-bit 100-MHz full-nyquist analog-to-digital converter," IEEE J. Solid-State Circuits, vol. 23, no. 6, pp. 1334-1344, Dec. 1988.
[33] K. Kusumoto, A. Matsuzawa, and K. Murata, "A 10-b 20-MHz 30-mW Pipelined Interpolating CMOS ADC," IEEE J. Solid-State Circuits, vol. 28, no. 12, pp. 12001206, Dec. 1993.
[34] B. P. Brandt and J. Lutsky, "A 75-mW, 10-b, 20-MSPS CMOS Subranging ADC with 9.5 Effective Bits at Nyquist," IEEE J. Solid-State Circuits, vol. 34, no. 12, pp. 1788-1795, Dec. 1999.
[35] J. Mulder, C. M. Ward, C.-H. Lin, D. Kruse, J. R. Westra, M. L. Lugthart, E. Arslan, R. J. van de Plassche, K. Bult, and F. M. van der Goes, "A 21mW 8b 125MS/s ADC Occupying 0.09 mm 2 in $0.13 \mu \mathrm{~m}$ CMOS," in Proc. ISSCC Dig. Tech. Papers, Feb. 2004, pp. 260-261.
[36] B. Nauta and A. G. W. Venes, "A 70-MS/s 110-mW 8-b CMOS folding and interpolating A/D converter," IEEE J. Solid-State Circuits, vol. 30, no. 12, pp. 1302-1308, Dec. 1995.
[37] B. Razavi, Principles of data conversion system design. New York: IEEE Press, 1995.
[38] K. Bult and A. Buchwald, "An embedded $240-\mathrm{mW} 10-\mathrm{b} 50-\mathrm{MS} / \mathrm{s}$ CMOS ADC in 1-mm²," IEEE J. Solid-State Circuits, vol. 32, no. 12, pp. 1887-1895, Dec. 1997.
[39] R. Taft, C. Menkus, M. Rosaria, O. Hidri, and V. Pons, "A 1.8-V 1.6-GSamples/s 8-b Self-Calibrating Folding ADC with 7.26 ENOB at Nyquist Frequency," IEEE J. Solid-State Circuits, vol. 39, no. 12, pp. 2107-2115, Dec. 2004.
[40] Y. P. Sunghyun Park and M. P. Flynn, "A 4GS/s Flash ADC in $0.18 \mu \mathrm{~m}$ CMOS," in Proc. ISSCC Dig. Tech. Papers, Feb. 2006, pp. 566-567.
[41] G. V. der Plas, S. Decoutere, and S. Donnay, "A 0.16pJ Conversion-Step 2.5mW $1.25 \mathrm{GS} / \mathrm{s} 4 \mathrm{~b}$ ADC in 90nm Digital CMOS Process," in Proc. ISSCC Dig. Tech. Papers, Feb. 2006, pp. 566-567.
[42] P. M. Figueiredo, P. Cardoso, A. Lopes, C. Fachada, N. Hamanishi, K. Tanabe, and J. Vital, "A 90nm CMOS 1.2V 1GS/s Two Step Subranging ADC," in Proc. ISSCC Dig. Tech. Papers, Feb. 2006, pp. 568-569.
[43] D. Fu, K. C. Dyer, S. H. Lewis, and P. J. Hurst, "A Digital Background Calibration Technique for Time-Interleaved Analog-to-Digital Converters," IEEE J. Solid-State Circuits, vol. 33, no. 12, pp. 1904-1911, Dec. 1998.
[44] K. C. Dyer, D. Fu, S. H. Lewis, and P. J. Hurst, "An Analog Background Calibration Technique for Time-Interleaved Analog-to-Digital Converters," IEEE J. Solid-State Circuits, vol. 33, no. 12, pp. 1912-1919, Dec. 1998.
[45] D. J. Huber, R. J. Chandler, and A. A. Abidi, "A 10b 160MS/s 84mW 1V Subranging ADC in 90nm CMOS," in Proc. ISSCC Dig. Tech. Papers, Feb. 2007, pp. 454-455.
[46] T. Sepke, J. K. Fiorenza, C. G. Sodini, P. Holloway, and H.-S. Lee, "Comparatorbased Switched Capacitor Circuits for Scaled CMOS Technologies," in Proc. ISSCC Dig. Tech. Papers, Feb. 2006, pp. 220-221.
[47] L. Brooks and H.-S. Lee, "A Zero-Crossing based 8b 200MS/s pipelined ADC," in Proc. ISSCC Dig. Tech. Papers, Feb. 2007, pp. 460-461.
[48] P. Bogner, F. Kuttner, C. Kropf, T. Hartig, M. Burian, and H. Eul, " A 14b 100MS/s digitally self-calibrated pipelined ADC in $0.13-\mu \mathrm{m}$ CMOS," in Proc. ISSCC Dig. Tech. Papers, Feb. 2006, pp. 224-225.
[49] S.-C. Lee, Y.-D. Jeon, K.-D. Kim, J.-K. Kwon, J. Kim, J.-W. Moon, and W. Lee, " A 10b 205MS/s $1 \mathrm{~mm}^{2} 90 \mathrm{~nm}$ CMOS Pipeline ADC for Flat-Panel Display Applications," in Proc. ISSCC Dig. Tech. Papers, Feb. 2007, pp. 458-459.
[50] A. Varzaghani and C.-K. K. Yang, "A 600-MS/s 5-Bit Pipeline A/D Converter Using Digital Reference Calibration," IEEE J. Solid-State Circuits, vol. 41, no. 2, pp. 310319, Feb. 2006.
[51] D.-L. Shen and T.-C. Lee, "A 6-bit 800-MS/s Pipelined A/D Converter With OpenLoop Amplifiers," IEEE J. Solid-State Circuits, vol. 42, no. 2, pp. 258-268, Feb. 2007.
[52] B. Murmann, EECS 247-Analysis and Design of VLSI-Analog-Digital Interface Integrated Circuits-Course notes, 2003.
[53] X. Jiang and M. F. Chang, "A 1-GHz signal bandwidth 6-bit CMOS ADC with power-efficeint averaging," IEEE J. Solid-State Circuits, vol. 40, no. 2, pp. 532-535, Feb. 2005.
[54] K. Bult, "Analog Design in Deep Sub-Micron CMOS," in Proc. of the 26th European Solid-State Circuits Conference, 2000, pp. 126-132.
[55] W. Ellersick, Data Converters for high speed links. Stanford University: Ph.D. Dissertation, 2001.
[56] P. C. S. Scholtens and M. Vertregt, "A 6-b 1.6 Gsamples/s flash ADC in $0.18 \mu \mathrm{~m}$ CMOS using averaging termination," IEEE J. Solid-State Circuits, vol. 37, no. 12, pp. 1599-1609, Mar. 2002.
[57] Z.-Y. Wang, H. Pan, C.-M. Chang, H.-R. Yu, and M. F. Chang, "A 600 MSPS 8-bit Folding ADC in $0.18 \mu \mathrm{~m}$ CMOS," in Proc. IEEE Custom Integrated Circuits Conf. Dig. Tech. Papers, 2004, pp. $424-427$.
[58] P. C. S. Scholtens and M. Vertregt, "A 6-b 1.6 Gsamples/s flash ADC in $0.18 \mu \mathrm{~m}$ CMOS using averaging termination," in Proc. ISSCC Dig. Tech. Papers, Feb. 2002, pp. $168-169$.
[59] G. Wegmann, E. A. Vittoz, and F. Rahali, "Charge injection in analog MOS switches," IEEE J. Solid-State Circuits, vol. 22, pp. 1091-1097, Dec. 1987.
[60] J. Rabaey, Digital Integrated Circuits: A Design Perpective. New Jersey: PrenticeHall International, Inc., 1996.
[61] C. Mangelsdorf, "A 400-MHz Input Flash Converter With Error Correction," IEEE J. Solid-State Circuits, vol. 25, no. 1, pp. 184-191, Feb. 1990.
[62] Y. Tamba and K. Yamakido, "A CMOS 6b 500MSample/s ADC for a Hard Disk Drive Read Channel," in Proc. ISSCC Dig. Tech. Papers, Feb. 1999, pp. 324-325.
[63] D. Draxelmayr, "A 6b 600MHz 10mW ADC array in digital 90nm CMOS," in Proc. ISSCC Dig. Tech. Papers, Feb. 2004, pp. $264-265$.
[64] Rodhe and Schwarz, $R$ \& S SMT Signal Generator - Data sheet, 2006.

## Publications resulting from this work

1. A. H. Ismail and M. Elmasry, "A 6-bit 1.6-GS/s Low Power Wide Bandwidth Flash ADC Converter in $0.13-\mu \mathrm{m}$ CMOS Technology," IEEE Journal of Solid State Circuits (submitted).
2. A. H. Ismail and M. Elmasry," Analysis of The Flash ADC Bandwidth-Accuracy Product in Deep Submicron CMOS Technologies," IEEE Transactions on Circuits and Systems II (submitted).
3. A. H. Ismail and M. Elmasry, "A Termination Technique for The Averaging Network of Flash ADC's," in Proc. IEEE International Symposium on Circuits and Systems (ISCAS), May 2006, pp. 3321-3324.
4. A. Ismail and M. Elmasry, "Analog-to-Digital Conversion for SONET OC-192," in Proc. IEEE International System-on-Chip Conference (SOCC), Sept. 2004, pp. 4144.
5. A. Ismail and M. Elmasry, "On the design of low power MCML based ring oscillators," in Proc. IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), May 2004, pp. 2383-2386.
6. A. Ismail and M. Elmasry, "A Low Power Design Approach for MOS Current Mode Logic," in Proc. IEEE International System-on-Chip Conference (SOCC), Sept, 2003, pp. 143-146.
7. A. Ismail and M. Elmasry, "Design of MOS Current Mode Logic," in Proc. Micronet Annual Workshop, Sept. 2003, pp. 49-50.

[^0]:    ${ }^{1}$ The threshold voltage mismatch is the main contributer to the first source of error, unless the transistor is biased at a large overdrive voltage.

[^1]:    ${ }^{2}$ Subranging ADC architecture is explained in Section 2.5.

[^2]:    ${ }^{3}$ Recall that static voltage mismatch depends on the DC operating point.

[^3]:    ${ }^{4}$ This is assuming that a 1-bit range overlap is applied to allow digital correction [37] [23].

[^4]:    ${ }^{5}$ Pipelined ADCs are avoided in applications that do not tolerate latency, such as ADCs used in control systems with a feedback loop.

[^5]:    ${ }^{6}$ ADCs with more than 8 bits of resolution.
    ${ }^{7}$ Thermal noise is proportional to $\frac{K T}{C}$.
    ${ }^{8} \mathrm{ADCs}$ with resolution of 6 bits and less.

[^6]:    ${ }^{1}$ The threshold voltage mismatch is the dominant source of offset. Therefore $\beta$ mismatch is ignored.

[^7]:    ${ }^{2}$ The parameter $\alpha$ has a slight dependence on technology, but this can be ignored for simplicity.

[^8]:    ${ }^{3}$ Time-interleaved ADCs are driven by $50 \%$ duty cycles with different uniformly spaced phases.

[^9]:    ${ }^{4}$ This is the current corresponding to the input referred offset voltage.

[^10]:    ${ }^{5}$ The conclusions that follow are independent of the value $W_{L i n}$.

[^11]:    ${ }^{6}$ The $\times 2$ interpolating architecture preamplifiers are the same size as that used for the full flash case.

[^12]:    ${ }^{1}$ The interface amplifier used in the triple cross-connection method also has a larger input capacitance than that of the regular preamplifiers [21].

[^13]:    ${ }^{1}$ This value for $T_{a}$ assumes that each of the sampling clock rise and fall times consumes $15 \%$ of the period.
    ${ }^{2}$ This the small signal $3-\mathrm{dB}$ bandwidth. The actual input signal is a large signal and would experience a smaller bandwidth.

[^14]:    ${ }^{3}$ The selection of the replica source follower size is a tradeoff between achieving low power dissipation and maintaining good matching between the main and replica circuits.

[^15]:    ${ }^{4}$ Eq. (5.2) assumes an ideal (linear) quantizer; that is, to say it has a zero INL and a zero DNL and does not contribute distortion.

[^16]:    ${ }^{5}$ This is achieved by reducing the value of W and L with the same ratio, keeping ( $\mathrm{W} / \mathrm{L}$ ) the same.

[^17]:    ${ }^{6}$ The offsets of the CMOS latches get divided by the large regeneration gain of the latched comparator. Therefore, CMOS latches offsets do not affect the overall performance.

[^18]:    ${ }^{1}$ Ferrite beads present a zero resistance for DC voltages and a high resistance for high frequency signals.

[^19]:    ${ }^{2} \mathrm{~A}$ bandpass filter is usually used in ADC testing, since it can filter the wide band noise and subharmonics too. However, based on signal generator data sheet [64] phase noise profile, it is estimated that the integrated noise over a bandwidth of 1.5 GHz remains below -53 dBc . Therefore, low pass filters were used instead of the more expensive tunable bandpass filters.
    ${ }^{3}$ This value is better than that required to test the 6 -bit ADC with an input analog signal at 1.5 GHz .

[^20]:    ${ }^{1}$ Using background calibration would limit the sampling speed.

[^21]:    ${ }^{1}$ The constant D is not to be mistaken for the number of dummies D in Subsection 4.2.1.

