Digital Filter Design and Realization-Wu-Sheng Lu - (2017, River Publishers)

River Publishers Series in Signal, Image and Speech Processing
Digital Filter Design and Realization

Takao Hinamoto and Wu-Sheng Lu

Analysis, design, and realization of digital filters have experienced major
developments since the 1970s, and have now become an integral part of the
theory and practice in the field of contemporary digital signal processing.
Digital Filter Design and Realization is written to present an up-to-date Takao Hinamoto and Wu-Sheng Lu
and comprehensive account of the analysis, design, and realization of digital
filters. It is intended to be used as a text for graduate students as well as
a reference book for practitioners in the field. Prerequisites for this book
include basic knowledge of calculus, linear algebra, signal analysis, and
linear system theory.
Technical topics discussed in the book include:
• Discrete-Time Systems and z-Transformation
• Stability and Coefficient Sensitivity
• State-Space Models
• FIR Digital Filter Design

• Frequency-Domain Digital Filter Design
• Time-Domain Digital Filter Design
• Interpolated and Frequency-Response-Masking FIR Digital Filter Design
• Composite Digital Filter Design
• Finite Word Length Effects
• Coefficient Sensitivity Analysis and Minimization
• Error Spectrum Shaping
• Roundoff Noise Analysis and Minimization
• Generalized Transposed Direct-Form II
• Block-State Realization
RIVER PUBLISHERS SERIES IN SIGNAL, IMAGE AND
SPEECH PROCESSING
Series Editors
MONCEF GABBOUJ THANOS STOURAITIS
Tampere University of Technology University of Patras
Finland Greece
Indexing: All books published in this series are submitted to Thomson Reuters Book
Citation Index (BkCI), CrossRef and to Google Scholar.
The “River Publishers Series in Signal, Image and Speech Processing” is a series
of comprehensive academic and professional books which focus on all aspects of
the theory and practice of signal processing. Books published in the series include
research monographs, edited volumes, handbooks and textbooks. The books provide
professionals, researchers, educators, and advanced students in the field with an
invaluable insight into the latest research and developments.
Topics covered in the series include, but are by no means restricted to the
following:
• Signal Processing Systems

• Digital Signal Processing
• Image Processing
• Signal Theory
• Stochastic Processes
• Detection and Estimation
• Pattern Recognition
• Optical Signal Processing
• Multi-dimensional Signal Processing
• Communication Signal Processing
• Biomedical Signal Processing
• Acoustic and Vibration Signal Processing
• Data Processing
• Remote Sensing
• Signal Processing Technology
• Speech Processing
• Radar Signal Processing
For a list of other books in this series, visit www.riverpublishers.com

Takao Hinamoto
Hiroshima University
Japan
Wu-Sheng Lu
University of Victoria
Canada
Published, sold and distributed by:
River Publishers
Alsbjergvej 10
9260 Gistrup
Denmark
River Publishers
Lange Geer 44
2611 PW Delft
The Netherlands
Tel.: +45369953197
www.riverpublishers.com
ISBN: 978-87-93519-64-0 (Hardback)

978-87-93519-34-3 (Ebook)
©2017 River Publishers
All rights reserved. No part of this publication may be reproduced, stored in

a retrieval system, or transmitted in any form or by any means, mechanical,
photocopying, recording or otherwise, without prior written permission of
the publishers.
Contents
Preface xvii
List of Figures xix
List of Tables xxv
List of Abbreviations xxvii
1 Introduction 1
1.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Terminology for Signal Analysis and Typical Signals . . . . 1
1.2.1 Terminology for Signal Analysis . . . . . . . . . . . 1
1.2.2 Examples of Typical Signals . . . . . . . . . . . . . 3
1.3 Digital Signal Processing . . . . . . . . . . . . . . . . . . . 4
1.3.1 General Framework for Digital Signal Processing . . 4
1.3.2 Advantages of Digital Signal Processing . . . . . . . 5
1.3.3 Disadvantages of Digital Signal Processing . . . . . 5
1.4 Analysis of Analog Signals . . . . . . . . . . . . . . . . . . 6
1.4.1 The Fourier Series Expansion of Periodic Signals . . 6
1.4.2 The Fourier Transform . . . . . . . . . . . . . . . . 7
1.4.3 The Laplace Transform . . . . . . . . . . . . . . . . 8
1.5 Analysis of Discrete-Time Signals . . . . . . . . . . . . . . 10
1.5.1 Sampling an Analog Signal . . . . . . . . . . . . . . 10
1.5.2 The Discrete-Time Fourier Transform . . . . . . . . 11
1.5.3 The Discrete Fourier Transform (DFT) . . . . . . . 13
1.5.4 The z-Transform . . . . . . . . . . . . . . . . . . . 13
1.6 Sampling of Continuous-Time Sinusoidal Signals . . . . . . 14
1.7 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.8 Sampling Theorem . . . . . . . . . . . . . . . . . . . . . . 17
1.9 Recovery of an Analog Signal . . . . . . . . . . . . . . . . 20
1.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
v
vi Contents
2 Discrete-Time Systems and z-Transformation 23

2.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Discrete-Time Signals . . . . . . . . . . . . . . . . . . . . . 23
2.3 z-Transform of Basic Sequences . . . . . . . . . . . . . . . 25
2.3.1 Fundamental Transforms . . . . . . . . . . . . . . . 25
2.3.2 Properties of z-Transform . . . . . . . . . . . . . . 27
2.4 Inversion of z-Transforms . . . . . . . . . . . . . . . . . . . 29
2.4.1 Partial Fraction Expansion . . . . . . . . . . . . . . 30
2.4.2 Power Series Expansion . . . . . . . . . . . . . . . 31
2.4.3 Contour Integration . . . . . . . . . . . . . . . . . . 32
2.5 Parseval’s Theorem . . . . . . . . . . . . . . . . . . . . . . 33
2.6 Discrete-Time Systems . . . . . . . . . . . . . . . . . . . . 34
2.7 Difference Equations . . . . . . . . . . . . . . . . . . . . . 37
2.8 State-Space Descriptions . . . . . . . . . . . . . . . . . . . 40
2.8.1 Realization 1 . . . . . . . . . . . . . . . . . . . . . 40
2.8.2 Realization 2 . . . . . . . . . . . . . . . . . . . . . 41
2.9 Frequency Transfer Functions . . . . . . . . . . . . . . . . 42
2.9.1 Linear Time-Invariant Causal Systems . . . . . . . . 42
2.9.2 Rational Transfer Functions . . . . . . . . . . . . . 43
2.9.3 All-Pass Digital Filters . . . . . . . . . . . . . . . . 45
2.9.4 Notch Digital Filters . . . . . . . . . . . . . . . . . 48
2.9.5 Doubly Complementary Digital Filters . . . . . . . 53
2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3 Stability and Coefficient Sensitivity 57

3.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.2 Stability in Terms of Poles . . . . . . . . . . . . . . 58
3.2.3 Schur-Cohn Criterion . . . . . . . . . . . . . . . . . 60
3.2.4 Schur-Cohn-Fujiwara Criterion . . . . . . . . . . . 60
3.2.5 Jury-Marden Criterion . . . . . . . . . . . . . . . . 61
3.2.6 Stability Triangle of Second-Order Polynomials . . . 62
3.2.7 Lyapunov Criterion . . . . . . . . . . . . . . . . . . 62
3.3 Coefficient Sensitivity . . . . . . . . . . . . . . . . . . . . 64
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Contents vii
4 State-Space Models 67
4.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 Controllability and Observability . . . . . . . . . . . . . . . 67
4.3 Transfer Function . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.1 Impulse Response . . . . . . . . . . . . . . . . . . 70
4.3.2 Faddeev’s Formula . . . . . . . . . . . . . . . . . . 71
4.3.3 Cayley-Hamilton’s Theorem . . . . . . . . . . . . . 73
4.4 Equivalent Systems . . . . . . . . . . . . . . . . . . . . . . 73
4.4.1 Equivalent Transformation . . . . . . . . . . . . . . 73
4.4.2 Canonical Forms . . . . . . . . . . . . . . . . . . . 74
4.4.3 Balanced, Input-Normal, and Output-Normal
State-Space Models . . . . . . . . . . . . . . . . . . 79
4.5 Kalman’s Canonical Structure Theorem . . . . . . . . . . . 81
4.6 Hankel Matrix and Realization . . . . . . . . . . . . . . . . 85
4.6.1 Minimal Realization . . . . . . . . . . . . . . . . . 85
4.6.2 Minimal Partial Realization . . . . . . . . . . . . . 87
4.6.3 Balanced Realization . . . . . . . . . . . . . . . . . 89
4.7 Discrete-Time Lossless Bounded-Real Lemma . . . . . . . 91
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5 FIR Digital Filter Design 97

5.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2 Filter Classification . . . . . . . . . . . . . . . . . . . . . . 98
5.3 Linear-phase Filters . . . . . . . . . . . . . . . . . . . . . . 100
5.3.1 Frequency Transfer Function . . . . . . . . . . . . . 100
5.3.2 Symmetric Impulse Responses . . . . . . . . . . . . 101
5.3.3 Antisymmetric Impulse Responses . . . . . . . . . . 104
5.4 Design Using Window Function . . . . . . . . . . . . . . . 108
5.4.1 Fourier Series Expansion . . . . . . . . . . . . . . . 108
5.4.2 Window Functions . . . . . . . . . . . . . . . . . . 110
5.4.3 Frequency Transformation . . . . . . . . . . . . . . 111
5.5 Least-Squares Design . . . . . . . . . . . . . . . . . . . . . 114
5.5.1 Quadratic-Measure Minimization . . . . . . . . . . 114
5.5.2 Eigenfilter Method . . . . . . . . . . . . . . . . . . 116
5.6 Analytical Approach . . . . . . . . . . . . . . . . . . . . . 117
5.6.1 General FIR Filter Design . . . . . . . . . . . . . . 117
5.6.2 Linear-Phase FIR Filter Design . . . . . . . . . . . 118
5.7 Chebyshev Approximation . . . . . . . . . . . . . . . . . . 120
viii Contents
5.7.1 The Parks-McClellan Algorithm . . . . . . . . . . . 120

5.7.2 Alternation Theorem . . . . . . . . . . . . . . . . . 121
5.8 Cascaded Lattice Realization of FIR Digital Filters . . . . . 124
5.9 Numerical Experiments . . . . . . . . . . . . . . . . . . . . 128
5.9.1 Least-Squares Design . . . . . . . . . . . . . . . . . 128
5.9.1.1 Quadratic measure minimization . . . . . 128
5.9.1.2 Eigenfilter method . . . . . . . . . . . . . 128
5.9.2 Analytical Approach . . . . . . . . . . . . . . . . . 129
5.9.2.1 General FIR filter design . . . . . . . . . 129
5.9.2.2 Linear-Phase FIR filter design . . . . . . . 130
5.9.3 Chebyshev Approximation . . . . . . . . . . . . . . 131
5.9.4 Comparison of Algorithms’ Performances . . . . . . 132
5.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6 Design Methods Using Analog Filter Theory 135

6.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.2 Design Methods Using Analog Filter Theory . . . . . . . . . 135
6.2.1 Lowpass Analog-Filter Approximations . . . . . . . 136
6.2.1.1 Butterworth approximation . . . . . . . . 136
6.2.1.2 Chebyshev approximation . . . . . . . . . 136
6.2.1.3 Inverse-Chebyshev approximation . . . . 137
6.2.1.4 Elliptic approximation . . . . . . . . . . . 138
6.2.2 Other Analog-Filter Approximations
by Transformations . . . . . . . . . . . . . . . . . . 140
6.2.2.1 Lowpass-to-lowpass transformation . . . . 140
6.2.2.2 Lowpass-to-highpass transformation . . . 140
6.2.2.3 Lowpass-to-bandpass transformation . . . 140
6.2.2.4 Lowpass-to-bandstop transformation . . . 141
6.2.3 Design Methods Based on Analog Filter Theory . . . 141
6.2.3.1 Invariant impulse-response method . . . . 141
6.2.3.2 Bilinear-transformation method . . . . . . 143
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7 Design Methods in the Frequency Domain 151

7.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.2 Design Methods in the Frequency Domain . . . . . . . . . . 151
7.2.1 Minimum Mean Squared Error Design . . . . . . . . 151
Contents ix
7.2.2 An Equiripple Design by Linear Programming . . . 155

7.2.3 Weighted Least-Squares Design with Stability
Constraints . . . . . . . . . . . . . . . . . . . . . . 157
7.2.4 Minimax Design with Stability Constraints . . . . . 161
7.3 Design of All-Pass Digital Filters . . . . . . . . . . . . . . . 164
7.3.1 Design of All-Pass Filters Based on Frequency
Response Error . . . . . . . . . . . . . . . . . . . . 164
7.3.2 Design of All-Pass Filters Based on Phase
Characteristic Error . . . . . . . . . . . . . . . . . . 167
7.3.3 A Numerical Example . . . . . . . . . . . . . . . . 170
7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
8 Design Methods in the Time Domain 173

8.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.2 Design Based on Extended Pade’s Approximation . . . . . . 175
8.2.1 A Direct Procedure . . . . . . . . . . . . . . . . . . 176
8.2.2 A Modified Procedure . . . . . . . . . . . . . . . . 177
8.3 Design Using Second-Order Information . . . . . . . . . . . 178
8.3.1 A Filter Design Method . . . . . . . . . . . . . . . . 178
8.3.2 Stability . . . . . . . . . . . . . . . . . . . . . . . . 182
8.3.3 An Efficient Algorithm for Solving (8.35) . . . . . . 185
8.4 Least-Squares Design . . . . . . . . . . . . . . . . . . . . . 190
8.5 Design Using State-Space Models . . . . . . . . . . . . . . 196
8.5.1 Balanced Model Reduction . . . . . . . . . . . . . . 196
8.5.2 Stability and Minimality . . . . . . . . . . . . . . . 199
8.6.1 Design Based on Extended Pade’s Approximation . . 204
8.6.2 Design Using Second-Order Information . . . . . . 205
8.6.3 Least-Squares Design . . . . . . . . . . . . . . . . . 208
8.6.4 Design Using State-Space Model (Balanced
Model Reduction) . . . . . . . . . . . . . . . . . . 209
8.6.5 Comparison of Algorithms’ Performances . . . . . . 209
8.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
9 Design of Interpolated and FRM FIR Digital Filters 213

9.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
9.2 Basics of IFIR and FRM Filters and CCP . . . . . . . . . . 213
x Contents
9.2.1 Interpolated FIR Filters . . . . . . . . . . . . . . . . 213

9.2.2 Frequency-Response-Masking Filters . . . . . . . . 214
9.2.3 Convex-Concave Procedure (CCP) . . . . . . . . . 217
9.3 Minimax Design of IFIR Filters . . . . . . . . . . . . . . . 218
9.3.1 Problem Formulation . . . . . . . . . . . . . . . . . 218
9.3.2 Convexification of (9.10) Using CCP . . . . . . . . 219
9.3.3 Remarks on Convexification in (9.13)–(9.14) . . . . 221
9.4 Minimax Design of FRM Filters . . . . . . . . . . . . . . . 222
9.4.1 The Design Problem . . . . . . . . . . . . . . . . . 222
9.4.2 A CCP Approach to Solving (9.23) . . . . . . . . . . 223
9.5 FRM Filters with Reduced Complexity . . . . . . . . . . . . 225
9.5.1 Design Phase 1 . . . . . . . . . . . . . . . . . . . . 225
9.5.2 Design Phase 2 . . . . . . . . . . . . . . . . . . . . 226
9.6 Design Examples . . . . . . . . . . . . . . . . . . . . . . . 227
9.6.1 Design and Evaluation Settings . . . . . . . . . . . 227
9.6.2 Design of IFIR Filters . . . . . . . . . . . . . . . . 227
9.6.3 Design of FRM Filters . . . . . . . . . . . . . . . . 229
9.6.4 Comparisons with Conventional FIR Filters . . . . . 234
9.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
10 Design of a Class of Composite Digital Filters 239

10.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
10.2 Composite Filters and Problem Formulation . . . . . . . . . 240
10.2.1 Composite Filters . . . . . . . . . . . . . . . . . . . 240
10.3 Design Method . . . . . . . . . . . . . . . . . . . . . . . . 243
10.3.1 Design Strategy . . . . . . . . . . . . . . . . . . . . 243
10.3.2 Solving (10.7) with y Fixed to y = yk . . . . . . . . 243
10.3.3 Updating y with x Fixed to x = xk . . . . . . . . . 244
10.3.4 Summary of the Algorithm . . . . . . . . . . . . . . 247
10.4 Design Example and Comparisons . . . . . . . . . . . . . . 248
10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
11 Finite Word Length Effects 253

11.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
11.2 Fixed-Point Arithmetic . . . . . . . . . . . . . . . . . . . . 254
11.3 Floating-Point Arithmetic . . . . . . . . . . . . . . . . . . . 257
Contents xi
11.4 Limit Cycles—Overflow Oscillations . . . . . . . . . . . . 257

11.5 Scaling Fixed-Point Digital Filters to Prevent Overflow . . . 260
11.6 Roundoff Noise . . . . . . . . . . . . . . . . . . . . . . . . 262
11.7 Coefficient Sensitivity . . . . . . . . . . . . . . . . . . . . 263
11.8 State-Space Descriptions with Finite Word Length . . . . . 264
11.9 Limit Cycle-Free Realization . . . . . . . . . . . . . . . . . 266
11.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
12 l2 -Sensitivity Analysis and Minimization 273

12.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
12.2 l2 -Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . 274
12.3 Realization with Minimal l2 -Sensitivity . . . . . . . . . . . 277
12.4 l2 -Sensitivity Minimization Subject to l2 -Scaling Constraints
Using Quasi-Newton Algorithm . . . . . . . . . . . . . . . 280
12.4.1 l2 -Scaling and Problem Formulation . . . . . . . . . 280
12.4.2 Minimization of (12.18) Subject to l2 -Scaling
Constraints — Using Quasi-Newton Algorithm . . . 281
12.4.3 Gradient of J(x) . . . . . . . . . . . . . . . . . . . 283
12.5 l2 -Sensitivity Minimization Subject to l2 -Scaling Constraints
Using Lagrange Function . . . . . . . . . . . . . . . . . . . 285
Constraints — Using Lagrange Function . . . . . . 285
12.5.2 Derivation of Nonsingular T from P to Satisfy
l2 -Scaling Constraints . . . . . . . . . . . . . . . . 287
12.6.1 Filter Description and Initial l2 -Sensitivity . . . . . 288
12.6.2 l2 -Sensitivity Minimization . . . . . . . . . . . . . 290
12.6.3 l2 -Sensitivity Minimization Subject to l2 -Scaling
Constraints Using Quasi-Newton Algorithm . . . . . 291
Constraints Using Lagrange Function . . . . . . . . 293
12.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
13 Pole and Zero Sensitivity Analysis and Minimization 299

13.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
13.2 Pole and Zero Sensitivity Analysis . . . . . . . . . . . . . . 300
13.3 Realization with Minimal Pole and Zero Sensitivity . . . . . 306
xii Contents
13.3.1 Weighted Pole and Zero Sensitivity Minimization

Without Imposing l2 -Scaling Constraints . . . . . . 306
13.3.2 Zero Sensitivity Minimization Subject to Minimal
Pole Sensitivity . . . . . . . . . . . . . . . . . . . . 309
13.4 Pole Zero Sensitivity Minimization Subject to l2 -Scaling
Constraints Using Lagrange Function . . . . . . . . . . . . 310
13.4.1 l2 -Scaling Constraints and Problem Formulation . . 310
Constraints — Using Lagrange Function . . . . . . 310
13.5 Pole and Zero Sensitivity Minimization Subject to l2 -Scaling
Constraints Using Quasi-Newton Algorithm . . . . . . . . . 312
13.5.1 l2 -Scaling and Problem Formulation . . . . . . . . . 312
Constraints — Using Quasi-Newton Algorithm . . . 313
13.5.3 Gradient of J(x) . . . . . . . . . . . . . . . . . . . 314
13.6.1 Filter Description and Initial Pole and Zero
Sensitivity . . . . . . . . . . . . . . . . . . . . . . 315
Without Imposing l2 -Scaling Constraints . . . . . . 316
Subject to l2 -Scaling Constraints Using Lagrange
Function . . . . . . . . . . . . . . . . . . . . . . . 318
Subject to l2 -Scaling Constraints Using
Quasi-Newton Algorithm . . . . . . . . . . . . . . . 321
13.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
14 Error Spectrum Shaping 327

14.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
14.2 IIR Digital Filters with High-Order Error Feedback . . . . . 328
14.2.1 N th-Order Optimal Error Feedback . . . . . . . . . 328
14.2.2 Computation of Autocorrelation Coefficients . . . . 330
14.2.3 Error Feedback with Symmetric or Antisymmetric
Coefficients . . . . . . . . . . . . . . . . . . . . . . 332
14.3 State-Space Filter with High-Order Error Feedback . . . . . 338
Contents xiii
14.3.1 N th-Order Optimal Error Feedback . . . . . . . . . 338

14.3.2 Computation of Qi for i = 0, 1, · · · , N − 1 . . . . . 341
14.3.3 Error Feedback with Symmetric or Antisymmetric
Matrices . . . . . . . . . . . . . . . . . . . . . . . . 342
14.4.1 Example 1 : An IIR Digital Filter . . . . . . . . . . 349
14.4.2 Example 2 : A State-Space Digital Filter . . . . . . 350
14.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
15 Roundoff Noise Analysis and Minimization 357

15.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
15.2 Filters Quantized after Multiplications . . . . . . . . . . . . 358
15.2.1 Roundoff Noise Analysis and Problem
Formulation . . . . . . . . . . . . . . . . . . . . . . 358
15.2.2 Roundoff Noise Minimization Subject to l2 -Scaling
Constraints . . . . . . . . . . . . . . . . . . . . . . 362
15.3 Filters Quantized before Multiplications . . . . . . . . . . . 364
15.3.1 State-Space Model with High-Order Error
Feedback . . . . . . . . . . . . . . . . . . . . . . . 364
15.3.2 Formula for Noise Gain . . . . . . . . . . . . . . . 366
15.3.4 Joint Optimization of Error Feedback
and Realization . . . . . . . . . . . . . . . . . . . . 368
15.3.4.1 The Use of Quasi-Newton Algorithm . . . 368
15.3.4.2 Gradient of J(x) . . . . . . . . . . . . . . 370
15.3.5 Analytical Method for Separate Optimization . . . . 372
15.4.1 Filter Description and Initial Roundoff Noise . . . . 373
15.4.2 The Use of Analytical Method
in Section 15.2.2 . . . . . . . . . . . . . . . . . . . 374
15.4.3 The Use of Iterative Method
in Section 15.3.4 . . . . . . . . . . . . . . . . . . . 375
15.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
16 Generalized Transposed Direct-Form II Realization 383

16.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
16.2 Structural Transformation . . . . . . . . . . . . . . . . . . . 384
xiv Contents
16.3 Equivalent State-Space Realization . . . . . . . . . . . . . . 388

16.3.1 State-Space Realization I . . . . . . . . . . . . . . . 388
16.3.2 State-Space Realization II . . . . . . . . . . . . . . 390
16.3.3 Choice of {Δi } Satisfying l2 -Scaling
Constraints . . . . . . . . . . . . . . . . . . . . . . 392
16.4 Analysis of Roundoff Noise . . . . . . . . . . . . . . . . . . 393
16.4.1 Roundoff Noise of ρ-Operator Transposed
Direct-Form II Structure . . . . . . . . . . . . . . . 393
16.4.2 Roundoff Noise of Equivalent State-Space
Realization . . . . . . . . . . . . . . . . . . . . . . 396
16.5 Analysis of l2 -Sensitivity . . . . . . . . . . . . . . . . . . . 397
16.5.1 l2 -Sensitivity of ρ-Operator Transposed
Direct-Form II Structure . . . . . . . . . . . . . . . 397
16.5.2 l2 -Sensitivity of Equivalent State-Space
Realization . . . . . . . . . . . . . . . . . . . . . . 400
16.6 Filter Synthesis . . . . . . . . . . . . . . . . . . . . . . . . 404
16.6.1 Computation of Roundoff Noise
and l2 -Sensitivity . . . . . . . . . . . . . . . . . . . 404
16.6.2 Choice of Parameters {γi |i = 1, 2, · · · , n} . . . . . 405
16.6.3 Search of Optimal Vector γ = [γ1 , γ2 , · · · , γn ]T . . 405
16.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
17 Block-State Realization of IIR Digital Filters 411

17.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
17.2 Block-State Realization . . . . . . . . . . . . . . . . . . . . 412
17.3 Roundoff Noise Analysis and Minimization . . . . . . . . . 419
17.3.1 Roundoff Noise Analysis . . . . . . . . . . . . . . . 419
17.3.2 Roundoff Noise Minimization Subject to l2 -Scaling
Constraints . . . . . . . . . . . . . . . . . . . . . . 422
17.4 l2 -Sensitivity Analysis and Minimization . . . . . . . . . . . 423
17.4.1 l2 -Sensitivity Analysis . . . . . . . . . . . . . . . . 423
Constraints . . . . . . . . . . . . . . . . . . . . . . 429
17.4.2.1 Method 1: using a Lagrange function . . . 429
17.4.2.2 Method 2: using a Quasi-Newton
algorithm . . . . . . . . . . . . . . . . . . 432
Contents xv
17.4.3 l2 -Sensitivity Minimization Without Imposing

17.4.4 Numerical Experiments . . . . . . . . . . . . . . . 435
17.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
Index 445
About the Authors 453

Preface

This book is written to present an up-to-date and comprehensive account of
the analysis, design, and realization of digital filters. It is intended to be used
as a text for graduate students as well as a reference book for practitioners
in the field. Prerequisites for this book include basic knowledge of calculus,
linear algebra, signal analysis, and linear system theory.
The text is organized into seventeen chapters which are outlined as follows:
Chapter 1 presents introductory materials on digital signal processing.
Chapter 2 describes several fundamental sequences, the z-transforms of
commonly encountered discrete-time functions, and basic properties of linear
discrete-time systems. Chapter 3 studies stability of recursive digital filters
and their coefficient sensitivity. Chapter 4 deals with mathematical properties
of linear discrete-time dynamical systems, studies transfer functions of linear
systems and their relation to state-space descriptions.
Chapter 5 presents the fundamentals of FIR digital filters and several
methods for the design of FIR digital filters. The next five chapters are related
to the design of digital filters. In Chapter 6, we are concerned with the design
of recursive digital filters using analog filter theory. Chapter 7 presents several
methods for the design of recursive digital filters in the frequency domain while
Chapter 8 investigates several methods for the design of recursive digital filters
in the time domain. Chapter 9 deals with efficient techniques for the design
of interpolated and frequency-response-masking (FRM) FIR digital filters.
Chapter 10 addresses the design of a class of composite digital filters by an
alternating convex optimization strategy to achieve equiripple passband and
least-squares stopband subject to peak-gain constraint.
Chapter 11 studies the finite-word-length effects in the implementation
of recursive digital filters. Chapter 12 deals with the l2 -sensitivity analysis
and minimization of state-space digital filters. Chapter 13 explores an pole
and zero sensitivity analysis and minimization of state space digital filters.
xvii
xviii Preface
Chapter 14 studies error spectrum shaping in the recursive digital filters

that are described by transfer functions or state-space models. Chapter 15
examines an roundoff noise analysis and minimization of state-space digital
filters, and develops a technique for jointly optimizing high-order error
feedback and realization to minimize the roundoff noise gain at the filter’s
output. Chapter 16 presents roundoff noise and l2 -sensitivity analyses of
the generalized transposed direct-form II structure and its equivalent state-
space realization, and describes a procedure for synthesizing the optimal filter
structure or equivalent state-space realization. In Chapter 17, we consider
block-state realization of an IIR digital filter, and examine several properties
of the block-state realization. Analysis of roundoff noise and minimization
of average roundoff noise gain subject to l2 -scaling constraints for block-
state realization are also examined. Moreover, a quantitative analysis on
l2 -sensitivity is performed, and two techniques for minimizing a sensitivity
measure known as average l2 -sensitivity subject to l2 -scaling constraints are
presented.
We wish to express our sincere gratitude to Professor Akimitsu Doi of
Hiroshima Institute of Technology, Hiroshima, Japan, for his kindness and
significant contributions in terms of extensive computer simulations and
drawing many figures for the book.
List of Figures
Figure 1.1 Four types of signals: (a) Analog signal, (b) Sampled-data
signal, (c) Quantized boxcar signal, (d) 3-bit quantized
digital signal. . . . . . . . . . . . . . . . . . . . . . . . . 2
Figure 1.2 General framework for the digital processing of an analog
signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Figure 1.3 Periodic sampling of an analog signal: (a) Sampler,
(b) Analog signal, (c) Sampled-data signal. . . . . . . . . 10
Figure 1.4 Relationship between the continuous-time and discrete-
time frequency variables in periodic sampling. . . . . . . 17
Figure 1.5 Illustration of aliasing. . . . . . . . . . . . . . . . . . . . 17
Figure 1.6 Aliasing of spectral components: (a) Spectrum of a band-
limited analog signal, (b) Spectrum of the discrete-time
signal, (c) (d) Spectrum of the discrete-time signal
with spectral overlap. . . . . . . . . . . . . . . . . . . . . 19
Figure 1.7 Ideal band-limited reconstruction by interpolation. . . . . 22
Figure 2.1 Several fundamental sequences. (a) Unit pulse. (b) Unit
step. (c) Unit ramp. (d) Exponential. (e) Sinusoidal. . . . . 25
Figure 2.2 Typical linear time-invariant systems with identical unit-
pulse responses. (a) Cascade forms. (b) Parallel forms. . . 36
Figure 2.3 Block-diagram symbols for digital filters. (a) Drawer point.
(b) Adder. (c) Constant multiplier. (d) Unit delay. . . . . . 37
Figure 2.4 Direct form II structure of IIR digital filters. . . . . . . . . 38
Figure 2.5 FIR digital filters. . . . . . . . . . . . . . . . . . . . . . . 38
Figure 2.6 Transposed direct form II structure of IIR digital filters. . . 39
Figure 2.7 Transposed form of FIR digital filters. . . . . . . . . . . . 39
Figure 2.8 Transformation of IIR digital filters. . . . . . . . . . . . . 40
Figure 2.9 Transformation of the transposed form of IIR digital
filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Figure 2.10 Two multiplier lattice two-pair for all-pass digital
filter implementation. . . . . . . . . . . . . . . . . . . . . 46
Figure 2.11 Cascaded lattice realization of an nth-order all-pass digital
filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Figure 2.12 Single multiplier lattice two-pair. . . . . . . . . . . . . . . 47
Figure 2.13 Normalized lattice two-pair. . . . . . . . . . . . . . . . . 48
xix
xx List of Figures
Figure 2.14 Another cascaded lattice realization of an nth-order all-pass

digital filter. . . . . . . . . . . . . . . . . . . . . . . . . . 48
Figure 2.15 Implementation of a single-frequency notch digital
filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Figure 2.16 Lattice structure of a second-order all-pass digital
filter in (2.94b). . . . . . . . . . . . . . . . . . . . . . . . 49
Figure 2.17 Notch frequency ωo , cutoff frequencies ω1 , ω2 and
bandwidth B = ω2 − ω1 of the magnitude response
in (2.97). . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Figure 2.18 Magnitude response of a notch filter in (2.106) where
ωo = 0.3π and ρ = 0.985. . . . . . . . . . . . . . . . . . 52
Figure 2.19 Implementation of the doubly complementary filter pair
as the sum and difference of all-pass digital filters. . . . . 53
Figure 2.20 The phase specification of an all-pass digital
filter A2 (z). . . . . . . . . . . . . . . . . . . . . . . . . . 55
Figure 3.1 Stability triangle. . . . . . . . . . . . . . . . . . . . . . . 62
Figure 4.1 A state-space model. . . . . . . . . . . . . . . . . . . . . 68
Figure 4.2 Canonical decomposition of a state-space model. . . . . . 84
Figure 5.1 A block diagram of an FIR digital filter. . . . . . . . . . . 98
Figure 5.2 Four types of ideal filters. (a) Ideal lowpass filter. (b) Ideal
highpass filter. (c) Ideal bandpass filter. (d) Ideal
bandstop filter. . . . . . . . . . . . . . . . . . . . . . . . 99
Figure 5.3 Typical magnitude response specifications. (a) Lowpass
filter. (b) Highpass filter. (c) Bandpass filter.
(d) Bandstop filter. . . . . . . . . . . . . . . . . . . . . . 100
Figure 5.4 Symmetric impulse responses. (a) N is odd.
(b) N is even. . . . . . . . . . . . . . . . . . . . . . . . . 102
Figure 5.5 Antisymmetric impulse responses. (a) N is odd.
(b) N is even. . . . . . . . . . . . . . . . . . . . . . . . . 105
Figure 5.6 Ideal lowpass filter characteristics. (a) Magnitude response.
(b) Phase characteristic. . . . . . . . . . . . . . . . . . . . 110
Figure 5.7 Plots of the fixed windows shown with solid lines
for clearness. . . . . . . . . . . . . . . . . . . . . . . . . 112
Figure 5.8 Gain responses of the fixed window functions. . . . . . . . 113
Figure 5.9 Magnitude responses of ideal filters. (a) Ideal lowpass filter.
(b) Ideal highpass filter. (c) Ideal bandpass filter. (d) Ideal
bandstop filter. . . . . . . . . . . . . . . . . . . . . . . . 115
Figure 5.10 Normalized lattice structure of a section. . . . . . . . . . . 126
Figure 5.11 Normalized lattice structure of cascaded two sections. . . . 126
Figure 5.12 Cascaded lattice structure of an FIR digital filter. . . . . . 127
Figure 5.13 The lattice structure of linear-phase FIR digital filters. . . . 128
Figure 5.14 The magnitude response of the resulting filter. . . . . . . . 129
List of Figures xxi

Figure 6.1 Magnitude response of the 6th-order IIR filter. . . . . . . . 144
Figure 6.2 Magnitude response of the 8th-order IIR filter. . . . . . . . 149
Figure 7.1 Magnitude response of the 6th-order lowpass IIR filter. . . 154
Figure 7.2 Magnitude response of the 4th-order lowpass IIR filter. . . 158
Figure 7.3 The frequency characteristics of the IIR filter. (a) Magnitude
response (left side). (b) Passband phase response
(right side). . . . . . . . . . . . . . . . . . . . . . . . . . 161
Figure 7.4 The frequency characteristics of the IIR filter. (a) Magnitude
response (left side). (b) Passband phase response
(right side). . . . . . . . . . . . . . . . . . . . . . . . . . 163
Figure 7.5 All-pass filter designed by using frequency response error.
(a) Phase characteristic. (b) Phase error characteristic. . . . 171
Figure 7.6 Amplitude characteristic of the frequency response error. . 171
Figure 7.7 All-pass filter designed with phase characteristic error.
(a) Phase characteristic. (b) Phase error characteristic. . . . 171
Figure 8.1 Time-domain IIR filter design. (a) Output error for least-
squares approximation problem using (8.3). (b) Equation
error for modified least-squares problem
minimizing (8.4). . . . . . . . . . . . . . . . . . . . . . . 174
Figure 8.2 The Gaussian filter. (a) Its impulse response.
(b) Its magnitude response. . . . . . . . . . . . . . . . . . 204
Figure 8.3 Magnitude response of a 3rd-order IIR digital filter designed
by a direct procedure. . . . . . . . . . . . . . . . . . . . . 205
Figure 8.4 Magnitude response of a 3rd-order IIR filter designed
by a modified procedure. . . . . . . . . . . . . . . . . . . 206
by (8.36). . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Figure 8.6 Magnitude response of a 4th-order IIR digital filter designed
by (8.63)–(8.65) and (8.27). . . . . . . . . . . . . . . . . 207
by a least-squares method. . . . . . . . . . . . . . . . . . 208
by balanced model reduction. . . . . . . . . . . . . . . . . 210
Figure 9.1 An IFIR filter. . . . . . . . . . . . . . . . . . . . . . . . . 214
Figure 9.2 Magnitude response of (a) F (z) and (b) F (z L )
with L = 4. . . . . . . . . . . . . . . . . . . . . . . . . . 215
Figure 9.3 Magnitude response of (a) M (z) and (b) H(z) =
F (z L )M (z). . . . . . . . . . . . . . . . . . . . . . . . . 216
xxii List of Figures
Figure 9.4 A single-stage FRM filter. . . . . . . . . . . . . . . . . . . 216

Figure 9.5 Amplitude response (in dB) of the IFIR filters
for Example 9.1 by the proposed algorithm (solid line)
and the method of [30] and [4] (dashed line) in (a) passband
and (b) stopband. . . . . . . . . . . . . . . . . . . . . . . 228
Figure 9.6 Amplitude response (in dB) of the IFIR filters
in Example 9.2 by the proposed algorithm (solid line)
and the method of [9] (dashed line) in (a) entire baseband
and (b) passband. . . . . . . . . . . . . . . . . . . . . . . 230
Figure 9.7 Amplitude response (in dB) of the FRM filters
in Example 9.3 in (a) entire baseband and (b) passband. . . 231
Figure 9.8 Amplitude response (in dB) of the FRM filter
Figure 10.1 A composite filter. . . . . . . . . . . . . . . . . . . . . . . 240
Figure 10.2 Amplitude response of (a) 1 + z −4 and (b) (1 + z −1 )4
(1 + z −2 )4 (1 + z −3 )1 (1 + z −4 )2 . . . . . . . . . . . . . . 242
Figure 10.3 Amplitude response of (a) the prototype filter Hp (z),
(b) the C-filter H(z) and (c) the C-filter H(z) over
the passband. . . . . . . . . . . . . . . . . . . . . . . . . 249
Figure 11.1 Storage of fixed-point numbers. . . . . . . . . . . . . . . 254
Figure 11.2 Quantizer characteristic for rounding of two’s complement
numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Figure 11.3 Probability density function of the quantization error. . . . 255
Figure 11.4 Overflow characteristics. (a) Two’s complement overflow
characteristic. (b) Saturation overflow characteristic. . . . 256
Figure 11.5 Storage of floating-point numbers. . . . . . . . . . . . . . 257
Figure 11.6 Overflow characteristics. (a) Two’s complement overflow
characteristic. (b) Saturation overflow characteristic. . . . 258
Figure 11.7 A system before scaling. . . . . . . . . . . . . . . . . . . 260
Figure 11.8 A system after scaling where s is a scaling factor. . . . . . 261
Figure 11.9 A linear equivalent model of internal quantization
of product. . . . . . . . . . . . . . . . . . . . . . . . . . 262
Figure 11.11 An actual state-space model. . . . . . . . . . . . . . . . . 265
Figure 11.12 A nonlinear section satisfying (11.41). . . . . . . . . . . . 267
Figure 12.1 The magnitude response of a lowpass digital filter. . . . . 288
Figure 12.2 Profile of S(Pk ) during the first 41 iterations. . . . . . . . 290
Figure 12.3 Profile of J(xk ) during the first 12 iterations. . . . . . . . 292
List of Figures xxiii
Figure 12.4 Profiles of S(Pk ) and tr[Kc P−1 k ] during the first
267 iterations. . . . . . . . . . . . . . . . . . . . . . . . . 295
Figure 13.1 Profile of Iγ (P, ξ) during the first 8 iterations
with γ = 1. . . . . . . . . . . . . . . . . . . . . . . . . . 320
Figure 13.2 Profile of Jγ (T ) during the first 19 iterations
with γ = 1. . . . . . . . . . . . . . . . . . . . . . . . . . 322
Figure 14.1 A quantizer with N th-order error feedback. . . . . . . . . 328
Figure 14.2 A state-space model with high-order error feedback. . . . . 339
Figure 15.1 Block diagram of a state-space digital filter. . . . . . . . . 358
Figure 15.2 Block diagram of an actual state-space digital filter
with several noise sources. . . . . . . . . . . . . . . . . . 359
Figure 15.3 Block diagram of a state-space model for noise
propagation. . . . . . . . . . . . . . . . . . . . . . . . . . 359
Figure 15.4 Block diagram of an equivalent state-space digital
filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
Figure 15.5 Block diagram of an actual state-space digital
filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
Figure 15.6 State-space model with N th-order error feedback
and error feedforward. . . . . . . . . . . . . . . . . . . . 365
Figure 15.7 Profile of Je3 (D, T̂ ) during the first 64 iterations. . . . . . . 376
Figure 15.8 Profile of Je3 (D, T̂ ) during the first 36 iterations. . . . . . . 378
Figure 16.1 Implementation of δ −1 (z). . . . . . . . . . . . . . . . . . 383
Figure 16.2 Transposed direct-form II structure of a ρ operator-based
IIR digital filter. . . . . . . . . . . . . . . . . . . . . . . . 387
Figure 16.3 Implementation of ρ−1 i (z). . . . . . . . . . . . . . . . . . 387
Figure 16.4 Transposed direct-form II structure of a ρ operator-based
IIR digital filter. . . . . . . . . . . . . . . . . . . . . . . . 390
Figure 16.5 Implementation of ρ−1 i (z) in case Δi = 1. . . . . . . . . . 390
Figure 17.2 Block-state realization using serial-in/parallel-out
and parallel-in/serial-out registers. . . . . . . . . . . . . . 415
Figure 17.3 Flow graph structure of a block-state realization for block
length of three. . . . . . . . . . . . . . . . . . . . . . . . 416
Figure 17.4 Profiles of Si (P)ave and tr[Kc P−1 ] during the first
84 iterations. . . . . . . . . . . . . . . . . . . . . . . . . 439
Figure 17.5 Profile of Jo (x) during the first 22 iterations. . . . . . . . . 440
Figure 17.6 Profile of Si (P)ave during the first 13 iterations. . . . . . 442
List of Tables
Table 1.1 Laplace transform pairs . . . . . . . . . . . . . . . . . . . 9

Table 2.1 z-Transform Pairs . . . . . . . . . . . . . . . . . . . . . . 27
Table 3.1 The Jury-Marden array . . . . . . . . . . . . . . . . . . . . 61

Table 5.1 R−1 = [λij ] for 0 ≤ i, j ≤ N and N < M
N −1
with N = . . . . . . . . . . . . . . . . . . . . . . 120
2
Table 5.2 Performance comparisons among algorithms . . . . . . . . 133
Table 8.1 Convergence of the efficient algorithm using 2nd-order
information . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Table 8.2 Performance comparison among algorithms . . . . . . . . . 210
Table 9.1 Coefficients of F (z) and M (z) for Example 9.2 . . . . . . 231
Table 9.2 Comparisons with conventional FIR filters . . . . . . . . . 235
Table 10.1 Comparisons of C-filter with EPLSS and P-M filters . . . . 250
Table 13.1 Performance comparison in Section 13.6.2 . . . . . . . . . 318
Table 13.2 Lagrange function method subject to scaling constraints . . 320
Table 13.3 Quasi-Newton method subject to scaling constraints . . . . 322
Table 13.4 Performance comparison among four methods . . . . . . . 323
Table 14.1 Suboptimal symmetric and antisymmetric error feedback
coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Table 14.2 Suboptimal symmetric and antisymmetric error feedback
matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Table 14.3 Error feedback for a 4th-order elliptic lowpass filter . . . . 350
Table 14.4 Powers-of-two error feedback for a 4th-order elliptic
lowpass filter . . . . . . . . . . . . . . . . . . . . . . . . . 351
Table 14.5 Error feedback noise gain (dB) for a 4th-order state-space
lowpass filter . . . . . . . . . . . . . . . . . . . . . . . . . 354
Table 14.6 Powers-of-two error feedback noise gain (dB) obtained
by rounding optimal and suboptimal solutions . . . . . . . 355
Table 15.1 Performance comparison . . . . . . . . . . . . . . . . . . . 379
Table 16.1 Performance comparison among verious γ . . . . . . . . . 409
xxv
List of Abbreviations
1-D One-dimensional
2-D Two-dimensional
A/D Analog-to-digital
BFGS Broyden-Fletcher-Goldfarb-Shanno
BP Bandpass
BS Bandstop
CCF Complementary comb filter
CCP Convex-concave procedure
C-filter Composite filter
D/A Digital-to-analog
DFT Discrete Fourier transform
DSP Digital signal processor
ECG Electrocardiograph
EEG Electroencephalogram
EPLSS Equiripple passbands and least-squares stopbands
FIR Finite Impulse Response
FRM Frequency response masking
FWL Finite-word-length
HP Highpass
IDFT Inverse discrete Fourier transform
IFIR Interpolated FIR
IIR Infinite Impulse Response
KKT Karush-Kuhn-Tucker
LBR Lossless bounded real
(L, L) system L-input/L-output state-space model
LP Lowpass
M-D Multidimensional
P-M filter Parks-McClellan filter
P-wave Primary wave
QP Quadratic programming
RGB Red, green, and blue
SDP Semidefinite programming
S/H Sample-and-hold
SISO Single-input/single-output
SOCP Second-order cone programming
S-wave Secondary wave
VLSI Very large scale integrated
xxvii
1
Introduction
1.1 Preview
This is a book that is primarily concerned with basic concepts and methods
in digital filter design and realization. The recent advances in the theory
and practice of digital signal processing have made it possible to design
sophisticated high-order digital filters and to carry out the large amounts
of computations required for their design and realization. In addition, these
advances in design and realization capability can be achieved at low cost due
to the widespread availability of inexpensive, powerful digital computers and
related hardware. Briefly put, therefore, the focus of this book is the design
and realization of digital filters.
To begin with, we introduce basic terminology for signal analysis and
present an overview of digital signal processing, explaining its advantages and
disadvantages. We then examine the sampling of an analog signal and that of
a continuous-time sinusoidal signal in connection with aliasing. Finally, the
sampling theorem is presented and also the method to recover an analog signal
from its discrete-time samples is explained. It is shown that if the bandwidth
of an analog signal is finite, in principle the analog signal can be reconstructed
from the samples, provided that the sampling rate is sufficiently high to avoid
aliasing.
1.2 Terminology for Signal Analysis and Typical Signals

1.2.1 Terminology for Signal Analysis
A one-dimensional (1-D) signal is a function of a single scalar variable. A
speech signal is an example of 1-D signals where the variable is time. For 1-D
signals, the variable is usually labeled as time. If the variable is continuous,
the signal is called a continuous-time signal, which is defined at very time
instantly. When the variable is discrete, the signal is called a discrete-time
1
2 Introduction
signal, which is defined at discrete instants of time. A continuous-time signal

with continuous amplitude is called an analog signal. A speech signal is
an example of analog signals. A discrete-time signal with discrete-valued
amplitudes represented by a finite number of digits is called a digital signal.
A digitized music signal stored on a CD-ROM disk is an example of digital
signals. A discrete-time signal with continuous-valued amplitude is called a
sampled-data signal. Therefore, a digital signal is a quantized sampled-data
signal. A continuous-time signal with discrete-valued amplitudes is called
a quantized boxcar signal. Four types of these signals are illustrated in
Figure 1.1, where the abscissa and ordinate axes are time and the amplitude
of signals, respectively.
A two-dimensional (2-D) signal is a function of two independent variables.
An image signal is an example of 2-D signals where the two independent
variables are two spatial variables. A multidimensional (M-D) signal is a
function of more than one variable. A black-and-white video signal is an
example of 3-D signals where the three independent variables are two spatial
variables and time. A color video signal is a 3-channel signal composed of
fa (t)
fa (kT)
t 0 2T 4T 6T 8T kT
(a) (b)
fq (t)
f (kT)
t 0 2T 4T 6T 8T kT
(c) (d)
Figure 1.1 Four types of signals: (a) Analog signal, (b) Sampled-data signal, (c) Quantized
boxcar signal, (d) 3-bit quantized digital signal.
1.2 Terminology for Signal Analysis and Typical Signals 3
three 3-D signals representing the three primary colors, namely, red, green,
and blue (RGB). In this book, our focus will be on the processing of 1-D
signals.
1.2.2 Examples of Typical Signals

A. Electrocardiograph (ECG) Signals
Electrical activity of the heart may be represented by ECG signals which are
essentially periodic waveforms.
B. Electroencephalogram (EEG) Signals
Several times the overall effect of the electrical activity due to random firing
of billions of individual neurons in the brain is represented by EEG signals.
C. Seismic Signals
Seismic signals are generated by the movement of rocks resulting from an
earthquake, a volcanic eruption, or an underground explosion. Specifically,
ground movement causes elastic waves in terms of primary wave (P-wave),
secondary wave (S-wave), and surface wave that propagate through the body
of the earth in all directions from the source of movement.
D. Diesel Engine Signals
In the precision adjustment of diesel engines during production, signal pro-
cessing plays an important role. For the efficient operation of the engine,
accurate determination of the topmost point of piston’s movement inside the
cylinder of the engine is required.
E. Speech Signals
A speech signal is formed by exciting the vocal tract and is composed of two
types of sounds, namely, voiced and unvoiced.
F. Musical Sounds
The sound generated by most musical instruments is produced by mechanical
vibrations caused by activating some form of mechanical oscillator that in turn
causes other parts of the instruments to vibrate. All these vibrations together
in a single instrument generate the musical sound.
G. Time Series
A time series is a sequence of data points in a successive order. Time series
occurs in business, economics, physical sciences, social sciences, engineering,
medicine, and many other fields. Examples of time series abound, for instance,
4 Introduction
the yearly average number of sunspots, daily stock prices, the value of total
monthly exports of a country, the yearly population of animal species in a
certain geographical area, the annual yields per acre of crops in a country, and
the monthly totals of international airline passengers over certain periods.
H. Images and Video Signals
An image is a 2-D signal whose intensity is a function of two spatial variables.
Typical examples are still images, photographs, radar and sonar images, and
medical X-rays. An image sequence such as that seen in a television, is a 3-D
signal whose image intensity at any point is a function of three variables, i.e.,
two spatial variables and time.
1.3 Digital Signal Processing

1.3.1 General Framework for Digital Signal Processing
Most signals of practical interest, such as speech signals, biological signals,
seismic signals, radar signals, sonar signals, audio and video signals, etc. are
analog. Digital signal processing techniques can be utilized to process analog
signals. In general, digital processing of analog signals consists of three basic
steps:
(1) An analog signal is converted into a digital signal by an A/D converter.
(2) This digital signal is then processed by a digital signal processor, resulting
in a processed digital signal.
(3) The processed digital signal is finally converted into an analog signal by
a D/A converter.
The digital processing of analog signals is illustrated in a block diagram form
in Figure 1.2. Since the amplitude of an analog signal varies with time, a
sample-and-hold (S/H) circuit is employed at first to sample the analog signal
at periodic intervals, and hold the sampled value constant at the input of the
analog-to-digital (A/D) converter to allow accurate digital conversion. The
input to the A/D converter is a staircase-type analog signal, and the output of
the A/D converter is a binary data stream which is processed by a digital signal
processor where the desired signal processing algorithm is implemented. The
output of the digital signal processor is another binary data stream which is
Analog Digital Analog Analog

Sample- A/D D/A
input signal lowpass output
and-hold converter converter
processor filter
Figure 1.2 General framework for the digital processing of an analog signal.
1.3 Digital Signal Processing 5
converted into a staircase-type analog signal by a digital-to-analog (D/A)

converter. A lowpass filter is then used at the output of the D/A converter to
remove all undesired high-frequency components and to deliver a processed
analog signal to the output.
1.3.2 Advantages of Digital Signal Processing

Digital signal processing offers several advantages:
(1) Operations of digital circuits do not depend on precise values of the
digital signals. Hence a digital circuit is less sensitive to tolerance of
component values, and is fairly independent of external parameters such
as temperature, aging, etc.
(2) Amenable to full integration. In particular, advances in very large scale
integrated (VLSI) circuits have made it possible to integrate highly
sophisticated DSP systems on a single chip.
(3) Since the signals and coefficients describing a processing operation are
represented as binary values, desirable accuracy can be achieved by
simply increasing the wordlength, subject to cost constraint. Moreover,
using floating-point arithmetic can further increase the dynamic range
for signals and coefficients.
(4) Digital implementation permits easy adjustment of processor character-
istics during the processing, such as in adaptive filtering.
(5) Digital implementation allows realization of certain characteristics,
which are impossible with analog implementation, such as exact linear
phase and multirate processing.
(6) Digital circuits can be cascaded without loading problems.
(7) Digital signals can be stored almost indefinitely without loss of infor-
mation on various storage media such as magnetic tapes and disks, and
optical disks.
1.3.3 Disadvantages of Digital Signal Processing

There are also disadvantages when digital signal processing techniques are
applied:
(1) Increased system complexity in digital processing of analog signals.
(2) Frequencies available for sampling and digital processing are often
limited.
(3) Digital systems are constructed by active devices that consume electrical
power.
6 Introduction
1.4 Analysis of Analog Signals

1.4.1 The Fourier Series Expansion of Periodic Signals
We now consider a periodic signal f (t) that is a periodic function of time.
The Fourier series allows one to express a given periodic function of time as
sum of an infinite number of sinusoids whose frequencies are harmonically
related. That is,
∞
1
f (t) = a0 + an cos nΩ0 t + bn sin nΩ0 t (1.1)
2
n=1
where Ω0 = 2π/T0 with T0 the period of the signal is called the fundamental
frequency, and the expansion coefficients an and bn are given by
T0 /2
2
an = f (t) cos nΩ0 t dt for n = 0, 1, 2, · · ·
T0 −T0 /2
T0 /2
2
bn = f (t) sin nΩ0 t dt for n = 1, 2, 3, · · ·
T0 −T0 /2
Equation (1.1) is called the sine-cosine-form of the Fourier series. Using

Euler’s formula, we can write
an cos nΩ0 t + bn sin nΩ0 t = cn ejnΩ0 t + c−n e−jnΩ0 t (1.2)
where
1 1
cn = an − jbn , c−n = an + jbn
2 2
By substituting (1.2) into (1.1), we obtain
∞

f (t) = c0 + cn ejnΩ0 t + c−n e−jnΩ0 t
n=1
∞
(1.3)

= cn ejnΩ0 t
n=−∞
This expression is called the complex form of the Fourier series. From (1.1)
and (1.2), it follows that
T0 /2
1
cn = f (t)e−jnΩ0 t dt for n = 0, ±1, ±2, · · · (1.4)
T0 −T0 /2
1.4 Analysis of Analog Signals 7
The magnitude |cn | and phase angle ∠cn = − tan−1 (bn /an ) of cn are called
the magnitude spectrum and phase spectrum, respectively. In the sequel, we
shall use a two-way arrow to represent a Fourier series pair, for instance
f (t) ←→ {cn } (1.5)
1.4.2 The Fourier Transform

A signal that is a function of time can be represented by a combination of
sinusoids and co-sinusoids of various frequencies. In such a depiction, an
infinite number of terms are usually employed. It is called a frequency domain
representation and known as the Fourier transform, which has been a powerful
tool for the analysis and design of filters.
The Fourier transform consists of a pair of integral relations
∞
F (Ω) = F[f (t)] = f (t)e−jΩt dt for − ∞ < Ω < ∞ (1.6)
−∞
and ∞
−1 1
f (t) = F [F (Ω)] = F (Ω)ejΩt dΩ (1.7)
2π −∞
Equations (1.6) and (1.7) are called the Fourier transform and inverse Fourier
transform, respectively. The magnitude and phase angle of F (Ω), namely,

−1 Im{F (Ω)}
|F (Ω)| and ∠F (Ω) = tan
Re{F (Ω)}
are called the magnitude spectrum and phase spectrum, respectively. In the
sequel, we shall use a two-way arrow to represent a Fourier transform pair,
for instance
f (t) ←→ F (Ω) (1.8)
We now consider the energy contained in a signal and then relate that energy
to the Fourier transform of the signal.
The total energy contained in a continuous-time signal f (t) is given by
∞
E= |f (t)|2 dt (1.9)
−∞
From (1.6) and (1.7), it follows that

∞ ∞
2 1
|f (t)| dt = |F (Ω)|2 dΩ (1.10)
−∞ 2π −∞
8 Introduction
This is called Parseval’s theorem that relates the energy contained in a

continuous-time signal to the Fourier transform of that signal.
1.4.3 The Laplace Transform

We start by obtaining the Fourier transform of an one-sided function of time,
multiplied by a function that decays as time increases, that is, f (t) = 0 for
t < 0 and e−σt f (t) for t ≥ 0 where σ is a real number. The Fourier transform
pair in this case can be written as
∞
Fσ (Ω) = f (t)e−(σ+jΩ)t dt for − ∞ < Ω < ∞ (1.11)
0
and ∞
1
f (t) = Fσ (Ω)e(σ+jΩ)t dΩ (1.12)
2π −∞
where the factor e−σt has been moved from the left side of (1.12) to the right.
By defining
s = σ + jΩ (1.13)
we obtain
ds
=j (1.14)
dΩ
provided that σ is constant. Substituting (1.13) and (1.14) into (1.11) and
(1.12) yields ∞
F (s) = L[f (t)] = f (t)e−st dt (1.15)
0
and σ+j∞
−1 1
f (t) = L [F (s)] = F (s)est ds (1.16)
2πj σ−j∞
respectively. Function F (s) is called the Laplace transform of f (t) and,
conversely, f (t) is called the inverse Laplace transform of F (s). In the sequel,
we shall use a two-way arrow to represent a Laplace transform pair, for instance
f (t) ←→ F (s) (1.17)
Final-value theorem:
The final-value theorem can be stated as follows [7]: If f (t) and df (t)/dt are
both Laplace transformable, if F (s) is the Laplace transform of f (t), and if
limt→∞ f (t) exists, then
1.4 Analysis of Analog Signals 9
lim f (t) = lim sF (s) (1.18)

t→∞ s→0
Initial-value theorem:
The initial-value theorem can be stated as follows [7]: If f (t) and df (t)/dt
are both Laplace transformable and if lims→∞ sF (s) exists, then
f (0) = lim f (t) = lim sF (s) (1.19)

t→+0 s→∞
Some common Laplace transform pairs can be found in Table 1.1.
Table 1.1 Laplace transform pairs

f (t) F (s)
Unit impulse δ(t) 1

1
Unit step uo (t)
s
1
e−at
s+a
1
t
s2
s
cos Ωt
s2 + Ω2
Ω
sin Ωt
s2 + Ω2
s+a
e−at cos Ωt
(s + a)2 + Ω2
Ω
e−at sin Ωt
(s + a)2 + Ω2
df (t)
sF (s) − f (0)
dt
t
F (s)
f (τ )dτ
0 s
e−at f (t) F (s + a)
t
f1 (τ )f2 (t − τ )dτ F1 (s)F2 (s)
0
10 Introduction
1.5 Analysis of Discrete-Time Signals

1.5.1 Sampling an Analog Signal
We now consider the process of sampling an analog signal and holding this
value. A discrete-time signal can be obtained by sampling an analog signal at
times kT for k = 0, 1, 2, · · · where T is the sampling period. The sampling
frequency is given by
1
Fs = Hz (1.20)
T
The sampling process is illustrated in Figure 1.3.
Suppose fa (t) is an analog signal input to the sampler, then the sampled
output signal fâ (t) is the product
fâ (t) = fa (t)ΔT (t) (1.21)
where the subscript a of fa (t) in (1.21) is used to indicate an analog signal. The
modulating function ΔT (t) is a train of uniformly spaced impulse functions
given by
fa(t) fa(kT)
Analog Discrete-time
Signal Signal
T
Sampler
(a)
fa (t)
fa (kT)
f a (t)
f a (kT)
t
0 T 3T 5T 7T 9T kT
(b) (c)
Figure 1.3 Periodic sampling of an analog signal: (a) Sampler, (b)Analog signal, (c) Sampled-
data signal.
1.5 Analysis of Discrete-Time Signals 11
∞

ΔT (t) = δa (t − kT ) (1.22)
k=−∞
where δa (t) is the Dirac delta function or simply the impulse function
defined by
∞
∞, t=0
δa (t) = δa (t)dt = 1
0, t = 0 −∞
∞
fa (t)δa (t − t0 )dt = fa (t0 )
−∞
If ΔT (t) is replaced by its series representation, the sampled output signal

fâ (t) in (1.21) can be expressed as
∞

fâ (t) = fa (t) δa (t − kT )
k=−∞
(1.23)
∞

= fa (t)δa (t − kT )
k=−∞
1.5.2 The Discrete-Time Fourier Transform

We shall start by finding the Fourier transform of a sampled analog signal fâ (t),
which has been discussed in the previous section. The Fourier transform of
fâ (t) is given by
∞
ˆ
F[fa (t)] = fâ (t) e−jΩt dt
−∞
∞ ∞

= fa (t)δa (t − kT ) e−jΩt dt (1.24)
−∞ k=−∞
∞
∞
= fa (t)δa (t − kT ) e−jΩt dt
k=−∞ −∞
Using a property of the Dirac delta function, this expression can be

reduced to
∞

F[fâ (t)] = fa (kT ) e−jΩkT (1.25)
k=−∞
12 Introduction
where fa (kT ) denotes the value of the kth sample of fa (t).

If we let f (k) = fa (kT ), then (1.25) induces the Fourier transform of
{f (k)} as F[f (k)]. In general, the Fourier transform of the discrete-time
signal is given by
∞

F (ω) = F[f (k)] = f (k) e−jωk (1.26)
k=−∞
where ω = ΩT . F (ω) in (1.26) is called the discrete-time Fourier transform

of a sequence f (k). The Fourier transform F (ω) is periodic with period 2π, as
can easily be verified from (1.26). The inverse discrete-time Fourier transform
of F (ω) is found to be
2π
−1 1
f (k) = F [F (ω)] = F (ω)ejωk dω (1.27)
2π 0
The magnitude and phase angle of F (ω), namely,

−1 Im{F (ω)}
|F (ω)| and ∠F (ω) = tan
Re{F (ω)}
sequel, we shall use a two-way arrow to represent a discrete-time Fourier
transform pair, for instance
f (k) ←→ F (ω) (1.28)
We now derive Parseval’s theorem for discrete-time signals. The total energy
contained in a discrete-time signal f (k) is given by
∞

E= |f (k)|2 (1.29)
k=−∞

∞
2π
2 1
|f (k)| = |F (ω)|2 dω (1.30)
2π 0
k=−∞
where |F (ω)|2 is the power spectrum of the signal f (k). This is the discrete-
time version of Parseval’s theorem.
1.5 Analysis of Discrete-Time Signals 13
1.5.3 The Discrete Fourier Transform (DFT)

Given a finite length sequence {f (n)} which is defined only in the interval
0 ≤ n ≤ N − 1, the discrete Fourier transform (DFT) is defined by
N
−1
F (k) = f (n) WNkn for k = 0, 1, · · · , N − 1 (1.31)
n=0
2π
where WN = e− N and WN is called the twiddle factor. Conversely, the
inverse discrete Fourier transform (IDFT) is given by
N −1
1
f (n) = F (k) WN−kn for n = 0, 1, · · · , N − 1 (1.32)
N
k=0
The magnitude and phase angle of F (k), namely,

−1 Im{F (k)}
|F (k)| and ∠F (k) = tan
Re{F (k)}
sequel, we shall use a two-way arrow to represent a DFT pair, for instance
f (n) ←→ F (k) (1.33)
We now examine Parseval’s theorem for an N -point sequence f (n) and its
N -point DFT F (k). The total energy contained in an N -point sequence f (n)
is given by
N
−1
E= |f (n)|2 (1.34)
n=0
N
−1 N −1
1
|f (n)|2 = |F (k)|2 (1.35)
N
n=0 k=0
This is commonly referred to as Parseval’s theorem for the DFT.
1.5.4 The z -Transform

In the study of discrete-time signals and systems, the z-transform plays an
important role similar to that of the Laplace transform for continuous-time
14 Introduction
signals and systems. We shall start by obtaining the Laplace transform of a

sampled analog signal fâ (t), which has been discussed in Section 1.5.1. From
(1.23), the Laplace transform of fâ (t) is described by
∞
ˆ
L[fa (t)] = fâ (t) e−st dt
0
∞ ∞

= fa (t)δa (t − kT ) e−st dt
0
(1.36)
k=−∞
∞
∞
= fa (t)δa (t − kT ) e−st dt
k=−∞ 0
By virtue of a property of the Dirac delta function, this expression can be

deduced to
∞
L[fâ (t)] = fa (kT ) e−skT (1.37)
k=−∞
If we let f (k) = fa (kT ) and

esT = z (1.38)
then (1.37) and (1.38) induce the Laplace transform of {f (k)} as Z[f (k)]. In
general, the z-transform of the discrete-time signal f (k) is given by
∞

F (z) = Z[f (k)] = f (k) z −k (1.39)
k=−∞
The reader is referred to Sections 2.3, 2.4 and 2.5 for further details.
1.6 Sampling of Continuous-Time Sinusoidal Signals

We now consider a continuous-time sinusoidal signal of the form
xa (t) = A cos(Ωt + θ) = A cos(2πF t + θ)

(1.40)
−∞ < t < ∞, −∞ < Ω, F < ∞
where A is the amplitude of the sinusoid, Ω is the frequency in radians per

second (rad/s), F is the frequency in hertz (Hz), θ is the phase in radian, and
Ω = 2πF .
1.6 Sampling of Continuous-Time Sinusoidal Signals 15
By acquiring samples of the signal xa (t) every T seconds, the discrete-time

sinusoidal signal can be obtained as

x(k) = xa (kT ) = A cos(2πF kT + θ) = A cos(kω + θ) (1.41)
where
F
ω = 2πf = 2πF T = ΩT, f = FT =
Fs
Here, f and Fs are called the normalized frequency and the sampling
frequency, respectively.
Assuming that
ωi = ω0 + 2πi for i = 0, 1, 2, · · · (1.42)
we have
xi (k) = A cos(kωi + θ) = A cos{k(ω0 + 2πi) + θ}
(1.43)
= A cos(kω0 + θ) = x0 (k)
hence discrete-time sinusoids, whose frequencies are separated by an integer

multiple of 2π, are identical. Alternatively, the sequences of any two sinusoids
with frequencies in the range −π ≤ ω ≤ π or − 12 ≤ f ≤ 21 are distinct. In
other words, discrete-time sinusoidal signal with frequencies in the range
|ω| ≤ π or |f | ≤ 12 are unique, and all frequencies in the range |ω| > π or
|f | > 12 are aliases. As a result, periodic sampling of a continuous-time signal
can be viewed as a mapping of an infinite frequency range −∞ < F < ∞
into a finite frequency range − 21 ≤ f ≤ 12 .
From f = F/Fs and |f | ≤ 12 , it follows that
Fs Fs
− ≤F ≤ (1.44)
2 2
Since the highest frequency in a discrete-time signal is ω = π, i.e. f = 12
with a sampling rate Fs = 1/T , the corresponding highest value of the
continuous-time frequency is given by
Fs 1
Fmax = = (1.45)
2 2T
16 Introduction
1.7 Aliasing
We now examine what happens to the frequencies {F } with F > Fmax =
Fs /2.
With sampling rate Fs = 1/T , the sampling of a continuous-time
sinusoidal signal
xa (t) = A cos(2πF0 t + θ) (1.46)
yields a discrete-time signal of the form

x(k) = xa (kT ) = A cos(2πF0 kT + θ) = A cos(2πf0 k + θ) (1.47)
where f0 = F0 /Fs is the normalized frequency of the sinusoid. Suppose the

frequency range for a continuous-time signal xa (t) is −Fs /2 ≤ F0 ≤ Fs /2,
the frequency range for a discrete-time signal x(k) is −1/2 ≤ f0 ≤ 1/2 which
is a one-to-one relationship between F0 and f0 . Hence, the analog signal xa (t)
can be reconstructed from the samples x(k).
Alternatively, if the sinusoid signals described by
xa (t) = A cos(2πFi t + θ) (1.48)
are sampled at rate Fs = 1/T where
Fi = F0 + iFs , i = ±1, ±2, · · ·
then the frequency Fi is outside the frequency range −Fs /2 ≤ F0 ≤ Fs /2.

In this case, the sampled signal is described by

x(k) = xa (kT ) = A cos(2πFi kT + θ)

F0 + iFs
= A cos 2π k+θ (1.49)
Fs
= A cos(2πf0 k + θ)
Equation (1.49) coincides with (1.47) which is derived from (1.46) by

sampling. Hence, there exist an infinite number of continuous-time sinusoids
that produce the same discrete-time signal after sampling. As a result, the
frequencies Fi = F0 + iFs for −∞ < i < ∞ are indistinguishable
from the frequency F0 after sampling and hence they are aliases of F0 . The
relation between the frequency variables of the continuous-time and discrete-
time signals is illustrated in Figure 1.4. An example of aliasing is shown in
1.8 Sampling Theorem 17
1
2
F
Fs Fs 0 Fs Fs
2 2
1
2
Figure 1.4 Relationship between the continuous-time and discrete-time frequency variables
in periodic sampling.
1kHz 400Hz : Fs=1.4kHz
Figure 1.5 Illustration of aliasing.
Figure 1.5 where two sinusoids with frequencies F0 = 400 Hz and F−1 =
−1000 Hz yield identical samples when a sampling rate of Fs = 1400 Hz is
employed.
1.8 Sampling Theorem

The periodic function ΔT (t) in (1.22) can be written using the Fourier series
expansion as
∞
2π
ΔT (t) = ck ej T kt (1.50)
k=−∞
18 Introduction
where the Fourier series expansion coefficients are given by

T
1 2 2π
ck = ΔT (t) e−j T kt dt (1.51)
T − T
2
Substituting (1.22) into (1.51), we obtain

T
1 2 2π 1 1
ck = δa (t) e−j T kt dt = e0 = (1.52)
T −T T T
2
Hence, by using (1.50), the sampled output signal fâ (t) in (1.21) can be
expressed as
∞
1 2π
fâ (t) = fa (t) ΔT (t) = fa (t) ej T kt (1.53)
T
k=−∞
Taking the Laplace transform on the both sides of (1.53) then gives
∞
ˆ 1 ∞ 2π
F̂a (s) = L[fa (t)] = fa (t) ej T kt e−st dt
T 0
k=−∞
∞ ∞
1 2π
= fa (t) e−(s−j T k)t dt (1.54)
T 0
k=−∞
∞
1 2π
= Fa s − j k
T T
k=−∞
where Fa (s) = L[fa (t)]. Substituting s = jΩ into (1.54), we obtain

∞

F̂a (jΩ) = Fs Fa j(Ω − kΩs )
k=−∞
(1.55)
∞

= Fs Fa j2π(F − kFs )
k=−∞
where Ω = 2πF and Ωs = 2π/T = 2πFs . This explains the relationship

between the spectrum F̂a (jΩ) of the sampled discrete-time signal fa (kT )
and the spectrum Fa (jΩ) of the analog signal fa (t). The right side of (1.55)
is periodic with repetition of the spectrum Fs Fa (jΩ) and period Ωs .
1.8 Sampling Theorem 19
We now consider the case where the spectrum Fa (jΩ) of an analog signal
fa (t) is band-limited as shown in Figure 1.6(a) and the spectrum is zero
for |F | ≥ B. If the sampling frequency Fs is chosen as Fs > 2B, the
spectrum F̂a (jΩ) of the sampled discrete-time signal fa (kT ) appears as shown
in Figure 1.6(b). Therefore, if the sampling frequency Fs is chosen so that
Fs ≥ 2B, where the frequency Fs = 2B is usually referred as the Nyquist
frequency, then it follows that
Fa( )
F
-B 0 B
(a)
^ (
F )
a
FsFa[ j s
Fs FsFa ( ) FsFa[ j s
F
-Fs 0 Fs Fs
(b) 2
^ (
F )
a
Fs
F
-Fs 0 Fs Fs
(c) 2
^ (
F )
a
Fs
F
-Fs 0 Fs
(d)
Figure 1.6 Aliasing of spectral components: (a) Spectrum of a band-limited analog signal,
(b) Spectrum of the discrete-time signal, (c) (d) Spectrum of the discrete-time signal with
spectral overlap.
20 Introduction
F̂a (jΩ) = Fs Fa (jΩ), |F | ≤ Fs /2 (1.56)

In this case, aliasing does not exist and hence the spectrum of the sampled
discrete-time signal fa (kT ) is essentially identical with that of the analog
signal fa (t) in the frequency range |F | ≤ Fs /2.
If the sampling frequency Fs is chosen so that Fs < 2B, the periodic
continuation of Fa (jΩ) generates spectral overlap as shown in Figures 1.6(c)
and 1.6(d). Hence, the spectrum F̂a (jΩ) of the sampled discrete-time signal
fa (kT ) contains aliased frequency components of the analog signal spectrum
Fa (jΩ). As a result, the analog signal fa (t) cannot be recovered from its
sample values {f (kT )} due to aliasing.
From the above observations, the sampling theorem can be stated as
follows.
Sampling Theorem:
If the highest frequency contained in an analog signal fa (t) is Fmax = B and
the signal is sampled at a rate Fs ≥ 2Fmax = 2B, then fa (t) can be exactly
recovered from its sample values {fa (kT )}.
1.9 Recovery of an Analog Signal

For a band-limited signal with highest frequency B, it is possible to avoid
aliasing by sampling the signal at a frequency Fs = 1/T (Ωs = 2π/T )
satisfying
Fs = 1/T ≥ 2B ( Ωs = 2π/T ≥ 4πB ) (1.57)
Namely, aliasing is avoidable if the signal is sampled at a frequency Fs which
is at least as high as the Nyquist frequency 2B. If the condition in (1.57) is
satisfied, then
Ωs Fs
F̂a (jΩ) = Fs Fa (jΩ) for |Ω| ≤ |F | ≤ (1.58)
2 2
Ωs
holds. Since Fa (jΩ) = 0 outside the interval |Ω| ≤ 2 , by using the inverse
Fourier transform of Fa (jΩ), we obtain
∞ Ωs /2
1 jΩt T
fa (t) = Fa (jΩ)e dΩ = F̂a (jΩ)ejΩt dΩ (1.59)
2π −∞ 2π −Ωs /2
If F̂a (jΩ) in (1.59) is represented by the discrete-time Fourier transform

1.10 Summary 21
∞

F̂a (jΩ) = fa (kT )e−jΩkT (1.60)
k=−∞
then it follows that
Ωs /2 ∞

T
fa (t) = fa (kT )e−jΩkT ejΩt dΩ
2π −Ωs /2 k=−∞
∞ Ωs /2
T
= fa (kT ) ejΩ(t−kT ) dΩ
2π −Ωs /2
k=−∞
∞
Ωs /2 (1.61)
T ejΩ(t−kT )
= fa (kT )
2π j(t − kT )
k=−∞ −Ωs /2
∞
sin(π/T )(t − kT )
= fa (kT )
(π/T )(t − kT )
k=−∞
The reconstruction formula in (1.61) is known as an interpolation formula

for reconstructing fa (t) from its sample values. This interpolation formula is
built on the interpolation function
sin(π/T )t
g(t) = (1.62)
(π/T )t
by summing up appropriately shifted g(t) by kT , namely g(t − kT ), for
k = 0, ±1, ±2, · · · with the sample values fa (kT ) as the weights. It is noted
that at t = iT , the interpolation function g(t − kT ) is zero except at i = k.
As a result, fa (t) evaluated at t = iT is simply equal to the sample value
fa (iT ). At all other times, the weighted sum of the time shifted versions
of the interpolation function combines to produce exactly fa (t). The ideal
band-limited interpolation process is illustrated in Figure 1.7.
1.10 Summary
This chapter has introduced terminology for signal analysis and examples
of typical signals, and presented an overview of digital signal processing
with the explanation of its advantages and disadvantages. This chapter has
also analyzed analog and discrete-time signals, and examined the sampling
of analog signals and that of continuous-time sinusoidal signals in connection
22 Introduction
fa(t)
fa(kT)
(k-2)T (k-1)T kT (k+1)T

Figure 1.7 Ideal band-limited reconstruction by interpolation.
to aliasing. Moreover, the sampling theorem has been presented with the
explanation of how to recover an analog signal from its discrete-time samples.
References
[1] A. V. Oppenheim and R. W. Schafer, Digital Signal Processing, NJ:
Prentice-Hall, 1975.
[2] S. K. Mitra, Digital Signal Processing, 3rd ed. NJ: McGraw-Hill, 2006.
[3] J. G. Proakis and D. J. Manolakis, Digital Signal Processing: Principles,
Algorithms, and Applications, 2nd ed. New York, Macmillan Publishing
Company, 1992.
[4] J. R. Johnson, Introduction to Digital Signal Processing, NJ: Prentice
Hall, 1989.
[5] T. Higuchi, Fundamentals of Digital Signal Processing, Tokyo, Japan,
Shokodo, 1986.
[6] P. M. Chirlian, Signals and Filters, New York: Van Nostrand Reinhold,
1994.
[7] K. Ogata, Modern Control Engineering, 3rd ed., New Jersey: Prentice-
Hall, Inc. 1997.
2
Discrete-Time Systems
and z-Transformation
2.1 Preview
This chapter covers fundamental concepts relating to discrete-time signals and
systems. In Section 2.2, we introduce common and important discrete-time
sequences such as unit pulse sequence, unit step sequence, unit ramp sequence,
exponential sequence, and cosine sequence. Most of these sequences consist
of evenly space samples of familiar continuous-time functions. In Section 2.3,
the one-side z-transform of a discrete-time sequence is defined, and then
fundamental z-transforms as well as z-transform properties are explained. In
Section 2.4, we present three methods for computing inverse z-transform using
partial fraction expansion, power series expansion, and contour integration,
respectively.
Several concepts that are most relevant to discrete-time systems, such as
linearity, time-invariance, stability, and causality, are studied in Section 2.5. In
Sections 2.6 and 2.7, linear time-invariant discrete-time systems are described
in terms of difference equations as well as state-space descriptions. The
block diagrams of digital filters and two methods for constructing state-space
descriptions from a difference equation are presented. In Section 2.8, transfer
functions in the frequency domain for causal linear time-invariant discrete-
time systems are introduced, and several analysis and design issues for all-pass,
notch, and doubly complementary digital filters are examined.
2.2 Discrete-Time Signals

Adiscrete-time signal f (k) is a function of a discrete variable k. It is a sequence
of numbers, called samples, indexed by the sample number k. In the rest of
the book, unless specified otherwise, it is always assumed that a discrete-time
signal is a sequence of samples obtained by sampling a continuous-time signal
23
24 Discrete-Time Systems and z-Transformation
with a constant sampling frequency. Below are several special sequences that
are found useful in the analysis of discrete-time signals and systems.
1. Unit Pulse Sequence
The unit pulse sequence has a unit sample for k = 0 and all subsequent
samples are zero, i.e.,

1, k=0
δ(k) = (2.1)
0, otherwise
2. Unit Step Sequence

The unit step sequence has samples that are unity for k = 0 and thereafter,
namely,
1, k = 0, 1, 2, · · ·
uo (k) = (2.2)
0, otherwise
3. Unit Ramp Sequence
The unit ramp sequence is defined as
f (k) = kuo (k) (2.3)
where uo (k) is the unit step sequence.

4. Exponential Sequence
An exponential sequence is evenly spaced samples of an exponential
function, that is,

e−αkT , k = 0, 1, 2, · · ·
f (k) = (2.4)
0, otherwise
where α > 0 and constant T , termed the sampling interval or sampling

period, is the time interval between samples.
5. Cosine Sequence
A cosine sequence is spaced samples of a cosine continuous-time
function, that is,

cos(ωkT + θ), k = 0, 1, 2, · · ·
f (k) = (2.5)
0, otherwise.
These sequences are shown in Figure 2.1.

2.3 z-Transform of Basic Sequences 25
δ (k) uo(k)
1 1
...
−1 1 2 3 ... k −1 1 2 3 ... k
(a) (b)
k uo(k) exp(−α kT )uo(k)

1
4
...
2 ...
−1 1 2 3 ... k −1 1 2 3 ... k
(c) (d)
cos(ω kT+θ )uo(k)

1
3 ...
−1 1 2 k
(e)
Figure 2.1 Several fundamental sequences. (a) Unit pulse. (b) Unit step. (c) Unit ramp.
(d) Exponential. (e) Sinusoidal.
2.3 z -Transform of Basic Sequences

The one-side z-transform of a sequence f (k) is defined by
∞

Z[f (k)] = F (z) = f (k)z −k (2.6)
k=0
The z-transform is denoted by Z[·] or F (z). Note that the one-sided

z-transform excludes samples before step zero in the transform.
2.3.1 Fundamental Transforms

1. The z-transform of the unit pulse is
∞

Z[δ(k)] = δ(k)z −k = δ(0) = 1 (2.7)
k=0
2. The z-transform of the unit step sequence becomes
∞
1 z
Z[uo (k)] = z −k = −1
= (2.8)
1−z z−1
k=0
3. The z-transform of a geometrical progression f (k) = ak is

∞
1 z
Z[ak ] = ak z −k = −1
= (2.9)
1 − az z−a
k=0
4. The z-transform of a ramp function f (k) = kuo (k) is

Z[kuo (k)] = 0 + z −1 + 2z −2 + 3z −3 + 4z −4 + · · ·
(2.10)
z −1 Z[kuo (k)] = 0 + z −2 + 2z −3 + 3z −4 + · · ·
where uo (k) is the unit step sequence in (2.2). By subtracting the second
equation from the first equation in (2.10), we obtain
z −1
(1 − z −1 )Z[kuo (k)] = z −1 (1 + z −1 + z −2 + z −3 + · · · ) =
1 − z −1
(2.11)
which yields
z −1 z
Z[kuo (k)] = = (2.12)
(1 − z −1 )2 (z − 1)2
5. The z-transform of a cosine function f (k) = cos(kωT ) becomes
∞ ∞ jkωT
e + e−jkωT −k
Z cos kωT = cos kωT z −k = z
2
k=0 k=0
∞

1
= (ejωT z −1 )k + (e−jωT z −1 )k
2 (2.13)
k=0
1 1 1
= jωT −1
+ −jωT
2 1−e z 1−e z −1
1 − z −1 cos ωT z(z − cos ωT )
= −1 −2
= 2
1 − 2z cos ωT + z z − 2z cos ωT + 1
The z-transform of several important sequences are listed in Table 2.1.
2.3 z-Transform of Basic Sequences 27
Table 2.1 z-Transform Pairs

f (k) F (z)
δ(k) 1
z
uo (k)
z−1
z
ak
z−a
z
k
(z − 1)2
z(z − cos ωT )
cos kωT
z 2 − 2z cos ωT + 1
z sin ωT
sin kωT
z 2 − 2z cos ωT + 1
z(z − r cos ωT )
rk cos kωT
z 2 − 2z r cos ωT + r2
z r sin ωT
rk sin kωT
z 2 − 2z r cos ωT + r2
2.3.2 Properties of z -Transform

Several important properties of the z-transform are given below.
1. Linearity
The z-transform of a linear combination of sequences is the linear
combination of the individual z-transforms, that is,
∞
∞
−k
Z[a1 f1 (k) + a2 f2 (k)] = a1 f1 (k)z + a2 f2 (k)z −k
k=0 k=0
(2.14)
= a1 Z[f1 (k)] + a2 Z[f2 (k)]
= a1 F1 (z) + a2 F2 (z)
where a1 and a2 are arbitrary constants. Equation (2.14) shows that the
z-transform is a linear transform.
2. Step-Shifted Relations
A sequence that is delayed i steps has the z-transform
∞ ∞

Z[f (k − i)] = f (k − i)z −k = z −i f (k − i)z −(k−i)
k=0 k=0
∞
(2.15)

−k −i
= z −i f (k)z = z F (z)
k=0
where f (k) = 0 for k < 0. Similarly, a sequence that is i-step advance

has the z-transform
∞ ∞

−k i
Z[f (k + i)] = f (k + i)z = z f (k + i)z −(k+i)
k=0 k=0
∞
i−1

= zi f (k)z −k − f (k)z −k (2.16)
k=0 k=0
i−1

= zi F (z) − f (k)z −k
k=0
3. Multiplication by a geometrical progression ak

The z-transform of sequence ak f (k) with a nonzero constant a is given by
∞
∞

Z[ak f (k)] = ak f (k)z −k = f (k)(a−1 z)−k = F (a−1 z)
k=0 k=l
(2.17)
4. Differentiation
A sequence multiplied by the step index k has the z-transform
∞
∞

Z[kf (k)] = kf (k)z −k = f (k)(kz −k )
k=0 k=0
∞
∞
d −k d
= f (k) −z z = −z f (k)z −k (2.18)
dz dz
k=0 k=0
dF (z)
= −z
dz
5. Convolution
The z-transform of the convolution of two sequences h(k) and f (k) is
given by
∞
∞
∞
Z h(l)f (k − l) = h(l)f (k − l)z −k
l=0 k=0 l=0
∞
∞

−l
= h(l)z f (k − l)z −(k−l) = H(z)F (z)
l=0 k=0
(2.19)
where f (k) = 0 for k < 0.
2.4 Inversion of z-Transforms 29
6. Initial-Value Theorem
The initial value of a sequence f (k) is the value of the sequence at k = 0.
It can be found from the z-transform of the sequence as

f (0) = lim f (0) + f (1)z −1 + f (2)z −2 + · · · = lim F (z)
z→∞ z→∞
(2.20)
7. Final-Value Theorem
The Final value of a sequence f (k) is the value of the sequence at k = ∞.
By virtue of
N

f (k)z −k = f (0) + f (1)z −1 + · · · + f (N )z −N
k=0
N

f (k − 1)z −k = z −1 [f (0) + f (1)z −1 + · · · (2.21)
k=0
+ f (N − 1)z −(N −1) ]
N
−1
−1
=z f (k)z −k
k=0
it follows that
N
N
−1
f (N ) = lim f (k)z −k − z −1 f (k)z −k (2.22)
z→1
k=0 k=0
As N → ∞, the final-value theorem is found from (2.22) as

lim f (k) = lim F (z) − z −1 F (z) = lim (1 − z −1 )F (z) (2.23)
k→∞ z→1 z→1
2.4 Inversion of z -Transforms

The z-transform of a sequence f (k) is defined in (2.6) as
∞

F (z) = f (k)z −k (2.24)
k=0
By multiplying both sides of (2.24) by z i−1 and integrating with a contour

integral for which the contour of integration encloses the origin and lies
entirely in the region of convergence of F (z), we obtain

∞
1 i−1 1
F (z)z dz = f (k)z −k+i−1 dz
2πj C 2πj C k=0
∞ (2.25)
1 −k+i−1
= f (k) z dz
2πj C
k=0
where C is a counterclockwise closed contour in the region of convergence

of F (z) and encircling the origin of the z-plane. Applying Cauchy’s integral
theorem
1 1, i=0
z i−1 dz = (2.26)
2πj C 0, i = 0
hence the right-hand side of (2.25) is equal to f (i). Therefore, the inverse
z-transform can be described by the contour integral as

1
f (k) = F (z)z k−1 dz (2.27)
2πj C
We now examine some methods for obtaining the inverse z-transform from a
given z-transform.
2.4.1 Partial Fraction Expansion

A useful method for obtaining the inverse z-transform from a given rational
z-transform is to perform a partial-fraction expansion and then compute the
inverse z-transform of each individual term. Suppose a rational z-transform
F (z) has been expressed in a partial-fraction expansion of the form
c1 c2 cN
F (z) = −1
+ −1
+ ··· + (2.28)
1 − λ1 z 1 − λ2 z 1 − λN z −1
then the inverse z-transform of F (z) is given by
f (k) = c1 λk1 + c2 λk2 + · · · + cN λkN for k = 0, 1, 2, . . . (2.29)
where the coefficients ci for i = 1, 2, · · · , N can be derived from

ci = (1 − λi z −1 )F (z)z=λ
i
As an example, consider the problem of computing the inverse z-transform of

−z 2 (z + 1)
F (z) =
6z 3 − 11z 2 + 6z − 1
2.4 Inversion of z-Transforms 31
We now express F (z) as
−(1 + z −1 )
F (z) =
6(1 − z −1 )(1 − 12 z −1 )(1 − 1
3 z −1 )
c1 c2 c3
= + 1 +
1−z −1 1− 2z −1 1 − 13 z −1
and compute
1 −1
3
−1
c1 = (1 − z )F (z) = −1, c2 = 1 − z F (z) =
z=1 2 z=1/2 2
1 2

c3 = 1 − z −1 F (z) =−
3 z=1/3 3
which yields
3 1 k 2 1 k
f (k) = −1 + − for k = 0, 1, 2, · · ·
2 2 3 3
2.4.2 Power Series Expansion

For a rational z-transform, a power series expansion can be obtained by long
division. Suppose a rational z-transform F (z) can be expressed in a power
series expansion of the form
∞
b m z m + · · · + b1 z + b 0
F (z) = n = f (k)z −k , m≤n (2.30)
z + an−1 z n−1 + · · · + a0
k=0
then the inverse z-transform f (k) of F (z) is readily obtained from (2.30).
Consider for example
2z 2 + z
F (z) = 2
z − 2z + 1
which admits a power series expansion as
F (z) = 2 + 5z −1 + 8z −2 + 11z −3 + 14z −4 + 17z −5 + 20z −6 + · · ·
Hence we obtain
{f (0), f (1), f (2), f (3), f (4), f (5), f (6), · · · } = {2, 5, 8, 11, 14, 17, 20, · · · }
2.4.3 Contour Integration

For a rational z-transform, the contour integrals in (2.27) can be obtained
using the residue theorem, that is,

1
f (k) = F (z)z k−1 dz
2πj C

(2.31)
k−1
= residues of F (z)z at the poles inside C
k = 0, 1, 2, · · ·
In the case where z = a is a first-order pole, the residue of F (z)z k−1 becomes

Res(a) = (z − a)F (z)z k−1 (2.32)
z=a
If z = a is a multiple pole with multiplicity n, the residue of F (z)z k−1 is

given by

1 dn−1 n

k−1
Res(a) = n−1
(z − a) F (z)z (2.33)
(n − 1)! dz z=a
As an example, consider
1 z3
F (z) =
8 (z − 1)(z − 12 )(z − 1
4 )
We now compute
1

Res(1) = (z − 1)F (z)z k−1 =
z=1 3
1 1 1 k

Res( 12 ) = z − F (z)z k−1 =−
2 z=1/2 4 2
1 1 1 k

Res( 14 ) = z − F (z)z k−1 =
4 z=1/4 24 4
which gives
1 1 1 k 1 1 k
f (k) = − + for k = 0, 1, 2, · · ·
3 4 2 24 4
2.5 Parseval’s Theorem 33
2.5 Parseval’s Theorem

In Section 1.5.2, Parseval’s theorem is explored in terms of Fourier transforms.
In this section, this theorem is extended to the domain of z-transforms. For
simplicity, we consider two real sequences f (k) and g(k) for k ≥ 0.
Theorem 2.1: Parseval’s Theorem

Let f (k) and g(k) be two sequences from l2 and let F (v) and G(v) be their
z-transforms. Then the following equality holds true:
∞

1
f (k)g(k) = F (v)G(1/v)v −1 dv (2.34)
2πj C
k=0
where the contour of integration is taken in the overlap of the regions of

convergence of F (v) and G(1/v).
Proof
Define a sequence w(k) as
w(k) = f (k)g(k) (2.35)
so that
∞

W (z) = f (k)g(k)z −k (2.36)
k=0
From (2.27), we can write

1
f (k) = F (v)v k−1 dv (2.37)
2πj C1
where C1 is a counterclockwise contour within the region of convergence of

F (v). Substituting (2.37) into (2.36) yields
∞

1
W (z) = g(k) F (v)(z/v)−k v −1 dv
2πj C1
k=0
∞

1
= F (v) g(k)(z/v)−k v −1 dv (2.38)
2πj C1 k=0

1
= F (v)G(z/v)v −1 dv
2πj C
Note that
∞

w(k) = W (z) (2.39)
z=1
k=0
which in conjunction with (2.38) gives
∞
1
f (k)g(k) = F (v)G(1/v)v −1 dv (2.40)
2πj C
k=0
This completes the proof of Theorem 2.1.

Suppose F (z) and G(z) converge on the unit circle, we can choose v = ejω
so that (2.34) becomes
∞
2π
1
f (k)g(k) = F (ejω )G(e−jω )dω (2.41)
2π 0
k=0
In particular, if f (k) and g(k) are two complex sequences, then (2.34) is
changed to
∞
1
f (k)g ∗ (k) = F (v)G∗ (1/v ∗ )v −1 dv (2.42)
2πj C
k=0
because Z[g ∗ (k)] = G∗ (z ∗ ) where g ∗ (k) denotes the conjugate complex

number of g(k).
2.6 Discrete-Time Systems

A discrete-time system can be considered as a unique transformation or
operator that maps an input sequence u(k) into an output sequence y(k), i.e.,
y(k) = S[u(k)] (2.43)
where S[ · ] represents a transformation or an operator. Discussed below are
several basic notion and concepts related to discrete-time systems.
1. Linearity
The class of linear systems is defined by the principle of superposition.
Supposing that y1 (k) = S[u1 (k)] and y2 (k) = S[u2 (k)], the system in (2.43)
is said to be linear if
S[au1 (k) + bu2 (k)] = aS[u1 (k)] + bS[u2 (k)]
(2.44)
= ay1 (k) + by2 (k)
for arbitrary constants a and b.
2.6 Discrete-Time Systems 35
2. Time-Invariance
The system in (2.43) is said to be a time-invariant system provided that
y(k − ko ) = S[u(k − ko )] (2.45)
always holds for an arbitrary integer ko .

By using the unit pulse sequence δ(k) defined in (2.1), an arbitrary
sequence u(k) can be represented as
∞

u(k) = u(i)δ(k − i) (2.46)
i=−∞
Therefore, the system in (2.43) can be written as

∞

y(k) = S u(i)δ(k − i) (2.47)
i=−∞
Under the assumption that the system in (2.43) is linear and time-invariant,
(2.47) can be expressed as
∞
∞

y(k) = u(i)S δ(k − i) = u(i)h(k − i) (2.48)
i=−∞ i=−∞
where h(k) = S[δ(k)]. The sequence h(k) in (2.48) is called unit-sample

response or impulse response. From (2.48), we see that a linear time-invariant
system is completely characterized by its impulse response.
Equation (2.48) is commonly referred to as the convolution sum, and is
said to be the convolution of u(k) with h(k) which is often denoted by
y(k) = u(k) ∗ h(k) (2.49)
An alternative expression for the system in (2.48) is given by

∞

y(k) = h(i)u(k − i) = h(k) ∗ u(k) (2.50)
i=−∞
The system described by (2.48) or (2.50) is called a digital filter.

As illustrated in Figure 2.2, two linear time-invariant systems connected in
cascade form constitute a linear time-invariant system whose impulse response
is the convolution of the two impulse responses. Two linear time-invariant
u(k) h1(k) h2(k) y(k) h1(k)

u(k) + y(k)
u(k) h2(k) h1(k) y(k) h2(k)
u(k) h1(k)*h2(k) y(k) u(k) h1(k)+h2(k) y(k)

(a) (b)
Figure 2.2 Typical linear time-invariant systems with identical unit-pulse responses. (a)
Cascade forms. (b) Parallel forms.
systems connected in parallel form are equivalent to a single system whose

impulse response is the sum of the individual impulse responses.
A more restricted class of linear time-invariant systems of practical
significance obeys stability and causality.
3. Stability
A system is said to be stable if every bounded input produces a bounded output.
A linear time-invariant system is stable if and only if
∞

|h(k)| < ∞ (2.51)
k=−∞
4. Causality
A system is said to be causal if the output y(k) for any k = ko depends on
the input u(k) for k ≤ ko only. A linear time-invariant system is causal if
and only if the impulse response h(k) is zero for k < 0. Therefore, for linear
time-invariant causal systems, (2.50) becomes
∞
k

y(k) = h(i)u(k − i) = h(k − i)u(i) (2.52)
i=0 i=−∞
By taking the z-transform of both sides in (2.52), we obtain

∞

Y (z) = h(i)z −i U (z) (2.53)
i=0
where U (z) = Z[u(k)] and Y (z) = Z[y(k)]. Therefore,

Y (z)
H(z) = = h(0) + h(1)z −1 + h(2)z −2 + · · · (2.54)
U (z)
2.7 Difference Equations 37
The H(z) in (2.54) is called the transfer function of a linear time-invariant

causal system in (2.52) whose impulse response is an infinite-length sequence.
The digital filter in (2.52) is realizable and of practical importance.
2.7 Difference Equations

Consider a difference equation of the form
n
n

y(k) = − ai y(k − i) + bi u(k − i) (2.55)
i=1 i=0
where u(k) is a scalar input, y(k) is a scalar output, ai ’s and bi ’s are scalar
coefficients, and the initial conditions are chosen as u(k) = y(k) = 0 for
k < 0. To illustrate the structure of a digital filter in a block diagram, we
introduce several basic block-diagram symbols for adder, constant multiplier,
and unit delay etc. as shown in Figure 2.3.
Using unit delay z −1 which is defined by
z −1 u(k) = u(k − 1), z −1 y(k) = y(k − 1) (2.56)
Equation (2.55) can be expressed as
n
n
−i
1+ ai z y(k) = bi z −i u(k) (2.57)
i=1 i=0
Moreover, by introducing an appropriate intermediate variable v(k), (2.57) is

decomposed into two parts
y(k) = (b0 + b1 z −1 + · · · + bn z −n )v(k)
u(k) (2.58)
v(k) =
1 + a1 z + · · · + an z −n
−1
f(k) f(k) f1(k) + f1(k) + f2(k)
f(k) f2(k)
(a) (b)
f(k) af(k) f(k) z-1 f(k-1)

a
(c) (d)
Figure 2.3 Block-diagram symbols for digital filters. (a) Drawer point. (b)Adder. (c) Constant
multiplier. (d) Unit delay.
hence (2.57) is equivalent to
y(k) = b0 v(k) + b1 v(k − 1) + · · · + bn v(k − n)

(2.59)
v(k) = u(k) − a1 v(k − 1) − · · · − an v(k − n)
A block diagram of the system in (2.59) is depicted in Figure 2.4. This figure
is referred to as the direct form II structure and has the minimum possible
number of delays [1]. Since feedback loops are involved in the filter structure,
systems with such a structure are called IIR (Infinite Impulse Response) or
recursive digital filters. In the case where feedback loop does not exist, i.e.,
ai = 0 for i = 1, 2, · · · , n, the difference equation in (2.55) becomes
y(k) = b0 u(k) + b1 u(k − 1) + · · · + bn u(k − n) (2.60)
The block diagram of the system in (2.60) is drawn in Figure 2.5. Since there
exist only feedforward paths in Figure 2.5, systems with such a structure are
called FIR (Finite Impulse Response), nonrecursive, or transversal digital
filters.
Using (2.56), (2.57) is transformed into
y(k) = b0 u(k) + [b1 u(k) − a1 y(k)]z −1 + · · · + [bn u(k) − an y(k)]z −n

= b0 u(k) + [b1 u(k − 1) − a1 y(k − 1)] + · · · (2.61)
+ [bn u(k − n) − an y(k − n)]
y(k)
b0 b1 b2 bn-1 bn
v(k) z -1 z -1 z -1
a1 a2 an-1 an
u(k)
Figure 2.4 Direct form II structure of IIR digital filters.
u(k) z -1 z -1 z -1
b0 b1 b2 bn-1 bn
y(k)
Figure 2.5 FIR digital filters.
2.7 Difference Equations 39
y(k)
an an-1 a2 a1
z -1
z -1
z -1
bn bn-1 b2 b1 b0
u(k)
Figure 2.6 Transposed direct form II structure of IIR digital filters.
The block diagram of the system in (2.61) is illustrated in Figure 2.6. This
figure is referred to as the transposed direct form II structure [1] and is
generated by reversing the directions of all branches in the network of
Figure 2.4. Such a procedure is called flow-graph reversal or transposition,
that leads to a set of transposed filter structures. In addition, in the case where
feedback loop does not exist, i.e., ai = 0 for all i = 1, 2, · · · , n, (2.61)
becomes
y(k) = b0 u(k) + b1 u(k)z −1 + · · · + bn u(k)z −n
(2.62)
= b0 u(k) + b1 u(k − 1) + · · · + bn u(k − n)
The block diagram of the system in (2.62) is shown in Figure 2.7. The block
diagram in Figure 2.7 which is generated by reversing the directions of all
branches in thenetwork of Figure 2.5 is called the transposed form of an FIR
digital filter.
By taking the z-transform on the both sides of (2.55), we obtain
n
n
−i
1+ ai z Y (z) = bi z −i U (z) (2.63)
i=1 i=0
which leads to
Y (z) b0 + b1 z −1 + · · · + bn z −n
H(z) = = (2.64)
U (z) 1 + a1 z −1 + · · · + an z −n
u(k)
bn b n-1 b1 b0
z -1
z -1
z -1
y(k)
Figure 2.7 Transposed form of FIR digital filters.
The H(z) in (2.64) is the transfer function of a difference equation in (2.55)

whose impulse response is a sequence of infinite length unless ai = 0 for all
i = 1, 2, · · · , n.
2.8 State-Space Descriptions

We now examine several procedures for realizing state-space models from
difference equations.
2.8.1 Realization 1
By substituting the second equation into the first equation in (2.59), we obtain
y(k) = b0 u(k) + (b1 − a1 b0 )v(k − 1) + · · · + (bn − an b0 )v(k − n)
v(k) = u(k) − a1 v(k − 1) − · · · − an v(k − n)

(2.65)
The block-diagram of the system in (2.65) is depicted in Figure 2.8 with
definition of
b̃i = bi − ai b0 for i = 1, 2, · · · , n
Defining a state-variable vector x(k) by
x(k) = [v(k − n), v(k − n + 1), · · · , v(k − 1)]T (2.66)
the difference equation in (2.65) can be realized by a state-space model as
x(k + 1) = Ax(k) + bu(k)

(2.67)
y(k) = cx(k) + du(k)
y(k)
~ ~ ~ ~
b0 b1 b2 bn-1 bn
z -1 z -1 z -1
v(k)
a1 a2 an-1 an
u(k)
Figure 2.8 Transformation of IIR digital filters.
2.8 State-Space Descriptions 41
where
⎡ ⎤
0 1 0 ··· 0 ⎡ ⎤
⎢ ⎥ 0
⎢ 0 0 1 ··· 0 ⎥ ⎢ .. ⎥
⎢ .. .. .. .. .. ⎥ ⎢ ⎥
A=⎢ . . . . . ⎥, b=⎢ . ⎥
⎢ ⎥ ⎣ 0 ⎦
⎣ 0 0 0 ··· 1 ⎦
1
−an −an−1 −an−2 · · · −a1

c = b̃n · · · b̃2 b̃1 , d = b0
2.8.2 Realization 2
By decomposing (2.61) into two parts, we have
y(k) = b0 u(k) + v(k)
v(k) = [b1 u(k) − a1 y(k)]z −1 + [b2 u(k) − a2 y(k)]z −2 + · · · (2.68)
+ [bn u(k) − an y(k)]z −n
which is equivalent to
y(k) = b0 u(k) + v(k)
v(k) = [(b1 − a1 b0 )u(k) − a1 v(k)]z −1 + · · · (2.69)
+ [(bn − an b0 )u(k) − an v(k)]z −n
The block diagram of the system in (2.69) is illustrated in Figure 2.9 with
definition of
b̃i = bi − ai b0 for i = 1, 2, · · · , n
Defining the output of each unit delay z −1 by state variables x̃1 (k),
x̃2 (k), · · · , x̃n (k) in order from the left to the right in Figure 2.9, the difference
equation in (2.69) can be realized by a state-space model as
v(k) y(k)
an an -1 a2 a1
z -1 z -1 z -1
~ ~ ~ ~
bn bn -1 b2 b1 b0
u(k)
Figure 2.9 Transformation of the transposed form of IIR digital filters.
x̃(k + 1) = Ãx̃(k) + b̃u(k)

(2.70)
y(k) = c̃ x̃(k) + du(k)
where x̃(k) = [x̃1 (k), x̃2 (k), · · · , x̃n (k)]T , v(k) = x̃n (k) and
⎡ ⎤ ⎡ ⎤
0 ··· 0 −an b̃n
⎢ . . .. ⎥ ⎢ .. ⎥
⎢ 1 . . .. . ⎥ ⎢ . ⎥
Ã = ⎢ . . ⎥ , b̃ = ⎢ ⎥
⎣ .. . . 0 −a2 ⎦ ⎣ b̃2 ⎦
0 ··· 1 −a1 b̃1

c̃ = 0 · · · 0 1 , d = b0
The coefficient matrices in (2.67) are related to those in (2.70) as

T T
A = Ã , b = c̃T , c = b̃ (2.71)
The system in (2.70) is called a dual system of the system in (2.67).
2.9 Frequency Transfer Functions

2.9.1 Linear Time-Invariant Causal Systems
We now consider a linear time-invariant causal system in (2.52) with the input
given by a complex exponential function
u(k) = ejωkT (2.72)
With such an input, we can write

∞
∞

y(k) = h(i)ejω(k−i)T = h(i)e−jωiT ejωkT (2.73)
i=0 i=0
from which it immediately follows that

∞
∞

jωT −jωiT −i
H(e )= h(i)e = h(i)z (2.74)

i=0 i=0 z=ejωT
The H(ejωT ) in (2.74) is called the frequency response or frequency transfer

function, which is obtained by substituting z = ejωT into its transfer function
H(z).
2.9 Frequency Transfer Functions 43
The H(ejωT ) in (2.74) can be expressed using polar coordinates form as

H(ejωT ) = H(ejωT ) ejθ(ω) (2.75)

where

H(ejωT ) = Re{H(ejωT )}2 + Im{H(ejωT )}2

Im{H(ejωT )}
θ(ω) = tan−1
Re{H(ejωT )}
are called the amplitude characteristic and phase characteristic, respectively.
It is obvious that |H(ejωT )| is an even function and θ(ω) is an odd function,
that is,
H(ejωT ) = H(e−jωT ) , θ(−ω) = −θ(ω) (2.76)
2.9.2 Rational Transfer Functions

Consider a rational transfer function described by
B(z) b0 + b1 z −1 + · · · + bm z −m
H(z) = = (2.77)
A(z) 1 + a1 z −1 + · · · + an z −n
with frequency response
B(ejωT ) b0 + b1 e−jωT + · · · + bm e−jmωT
H(ejωT ) = jωT
= (2.78)
A(e ) 1 + a1 e−jωT + · · · + an e−jnωT
Using polar coordinates form, H(ejωT ) in (2.78) can be expressed as

H(ejωT ) = H(ejωT ) ejθ(ω) (2.79)
where

2 2
m m
bi cos iωT + bi sin iωT
i=0 i=1
H(ejωT ) =
2 2
n n
ai cos iωT + ai sin iωT
i=0 i=1
⎛ n ⎞ ⎛ m ⎞

⎜ ai sin iωT ⎟ ⎜ bi sin iωT ⎟
⎜ ⎟ ⎜ ⎟
θ(ω) = tan−1 ⎜
⎜
i=1
n
⎟ − tan−1 ⎜ i=1
⎟ ⎜ m
⎟
⎟
⎝ ⎠ ⎝ ⎠
ai cos iωT bi cos iωT
i=0 i=0
with a0 = 1 and −π ≤ ωT ≤ π. By virtue of (2.76), we have

H(ejωT ) = H(ejωT ) ejθ(ω)
(2.80)
H(e−jωT ) = H(ejωT ) e−jθ(ω)
This leads to
1 H(ejωT )
θ(ω) = ln (2.81)
2j H(e−jωT )
The group delay of the phase characteristic is defined by

dθ(ω)
τ (ω) = − (2.82)
dω
which implies that

dθ(ω) dz dθ(ω)
τ (ω) = − = − jT z (2.83)
dz dω z=ejωT dz z=ejωT
Using (2.81) and (2.83), we compute

T d ln H(z) d ln H(z −1 )
τ (ω) = − z −
2 dz dz z=ejωT
T 1 dH(z) 1 dH(z −1 )
=− z + z −1 (2.84)
2 H(z) dz H(z −1 ) dz −1 z=ejωT
1 dH(z)
= −T Re z
H(z) dz z=ejωT
where we have utilized

d ln H(z −1 ) d ln H(z −1 ) dz −1
=
dz dz −1 dz (2.85)
d ln H(z −1 )
= −z −2
dz −1

' (
A(z) 1 dB(z) dA(z)
τ (ω) = − T Re z · A(z) − B(z)
B(z) A2 (z) dz dz z=ejωT
1 dB(z) 1 dA(z)
= − T Re z −z
B(z) dz A(z) dz z=ejωT
⎡ m n ⎤ (2.86)

−i −i
⎢ ibi z iai z ⎥
⎢ i=1 ⎥
= − T Re⎢ ⎢ m − i=1
n
⎥
⎥
⎣ ⎦
bi z −i ai z −i
i=0 i=0 z=ejωT
2.9.3 All-Pass Digital Filters

All-pass digital filters form a particularly important class of IIR digital
filters in which the numerator polynomial is generated from the denominator
polynomial by reversing the order of the coefficients, hence the transfer
function of an all-pass digital filter assumes the form
an + an−1 z −1 + · · · + a1 z −(n−1) + z −n
H(z) =
1 + a1 z −1 + a2 z −2 + · · · + an z −n
(2.87)
1 + a1 z + · · · + an z n
= z −n
1 + a1 z −1 + · · · + an z −n
Hence the frequency response of the all-pass digital filter is given by
1 + a1 ejω + · · · + an ejnω −jnω
H(ejω ) = e = ejθ(ω) (2.88)
1 + a1 e−jω + · · · + an e−jnω
where ⎛ n ⎞

⎜ ai sin iω ⎟
⎜ ⎟
θ(ω) = −nω + 2 tan−1 ⎜
⎜
i=1
n
⎟
⎟
⎝ ⎠
1+ ai cos iω
i=1

0 if ω = 0
θ(ω) =
−nπ if ω = π
and without loss of generality, the sampling interval T has been set to
unity.
From (2.88), it is seen that the amplitude response of all-pass digital filters
is equal to unity over the entire baseband. As a result, by connecting a discrete-
time system with a properly designed all-pass digital filter in cascade, the phase
response of the system can be altered in a desired manner without affecting
the amplitude response of the original discrete-time system. Moreover, as a
signal processing building block all-pass digital filter admits computationally
efficient implementation that renders it useful in many signal processing
applications [8]. To see this, we express H(z) in (2.87) as H(z) = Y (z)/U (z).
By applying the inverse z-transform to the equation, the following nth-
order difference equation is found as the time-domain representation of the
filter in (2.87)
n

y(k) = − ai [y(k − i) − u(k − n + i)] + u(k − n) (2.89)
i=1
where only n multiplications are required to compute each output sample,

whereas 2n delay (or storage) elements are required to realize the filter.
Another useful structure for realizing all-pass digital filters is the Gray and
Markel lattice filter [8, 9]. For illustration purpose, we consider two multiplier
lattice two-pair for all-pass digital filter implementation, shown in Figure 2.10.
From the figure, we can write
Xi (z) = Ui (z) − ki z −1 Hi−1 (z)Xi (z)

(2.90)
Yi (z) = z −1 Hi−1 (z) + ki Xi (z)
which yields
Yi (z) z −1 Hi−1 (z) + ki
Hi (z) = = (2.91)
Ui (z) 1 + ki z −1 Hi−1 (z)
U i (z)
-ki Xi ( z)
Hi - 1 ( z)
ki
Yi (z) z -1
Figure 2.10 Two multiplier lattice two-pair for all-pass digital filter implementation.
where H0 (z) = 1 and ki = Hi (∞). For example, from (2.91) it follows that
z −1 H0 (z) + k1 z −1 + k1
H1 (z) = =
1 + k1 z −1 H0 (z) 1 + k1 z −1
(2.92)
z −1 H1 (z) + k2 z −2 + k1 (1 + k2 )z −1 + k2
H2 (z) = −1
=
1 + k2 z H1 (z) 1 + k1 (1 + k2 )z −1 + k2 z −2
which correspond to the transfer functions of first-order and second-order all-
pass digital filters, respectively. Cascaded lattice realization of an nth-order
all-pass digital filter using two multiplier lattice two-pair modules is depicted
in Figure 2.11 where Hn (z) = Yn (z)/Un (z). It is known [8] that the nth-order
all-pass digital filter is stable if |ki | < 1 for i = 1, 2, · · · , n.
Alternatively, two multiplier lattice two-pair in Figure 2.10 can also be
implemented as per Figures 2.12 or 2.13 without altering the overall transfer
function.
We remark that Figure 2.12 is the single multiplier lattice two-pair [9],
which requires the fewest number of multipliers. Figure 2.13 is the normalized
lattice two-pair [10] where all internal nodes are automatically scaled in
the l2 sense [11]. By employing single multiplier lattice two-pair in Fig-
ure 2.12, the lattice structure of Figure 2.11 can be transformed into that of
Figure 2.14.
Un (z)
-kn -k2 -k1
kn k2 k1
Yn (z) z -1 z -1 z -1
Figure 2.11 Cascaded lattice realization of an nth-order all-pass digital filter.
ki
Ui (z)
-
Hi - 1 ( z)
Yi (z) z -1
Figure 2.12 Single multiplier lattice two-pair.
1-ki2
Ui (z)
ki -ki Hi - 1 ( z)
Yi (z) z -1
1-ki2
Figure 2.13 Normalized lattice two-pair.
kn k2 k1
Un (z)
- - -
Yn (z) z -1 z -1 z -1
Figure 2.14 Another cascaded lattice realization of an nth-order all-pass digital filter.
2.9.4 Notch Digital Filters

Notch digital filters are useful for removing a single-frequency component
from a signal such as an unmodulated carrier in communication systems or
power supply hum from a sampled analog signal, and so on.
The magnitude response of an ideal notch digital filter Hnotch (z) satisfies

jω
0, for ω = ωo
|Hnotch (e )| = (2.93)
1, otherwise
where ωo is the notch frequency. However, (2.93) can only hold in theory
because zero bandwidth cannot be realized in practice. A realistic requirement
at ωo is to satisfy the specified 3 dB rejection bandwidth B. Note √ that the
frequencies ω1 and ω2 where the magnitude response goes to 1/ 2 are called √
the 3 dB cutoff frequencies, and if the magnitude response is less than 1/ 2
for ω1 < ω < ω2 , their difference B = ω2 − ω1 is called 3 dB rejection
bandwidth.
It can be shown that the actual transfer function of a single-frequency notch
digital filter with a notch frequency ωo of bandwidth B can be expressed in
the form [8]
1 + H2 (z)
H(z) = (2.94a)
2
where
k2 − k1 (1 + k2 )z −1 + z −2
H2 (z) = (2.94b)
1 − k1 (1 + k2 )z −1 + k2 z −2
is a second-order all-pass digital filter with
1 − tan B2
k1 = cos ωo , k2 = (2.94c)
1 + tan B2
The implementation of a single-frequency notch digital filter is depicted in

Figure 2.15 where H(z) = Y (z)/U (z).
The lattice structure of a second-order digital all-pass filter in (2.94b) is
shown in Figure 2.16 where H2 (z) = Y2 (z)/U2 (z).
By substituting (2.94b) into (2.94a), the transfer function of the single-
frequency notch digital filter is found to be
1 + k2 1 − 2k1 z −1 + z −2
H(z) = ·
2 1 − k1 (1 + k2 )z −1 + k2 z −2
(2.95)
1 + k2 z − 2k1 + z −1
= ·
2 z − k1 (1 + k2 ) + k2 z −1
U(z) Y(z)
1
H2 ( z) 2
Figure 2.15 Implementation of a single-frequency notch digital filter.
k2 -k1
U2 (z)
- -
Y2 (z) z -1 z -1
Figure 2.16 Lattice structure of a second-order all-pass digital filter in (2.94b).
and the frequency response of the filter in (2.95) is given by

1 + k2 ejω − 2k1 + e−jω
H(ejω ) =
2 ejω − k1 (1 + k2 ) + k2 e−jω
1 (2.96)
=
1 − k2 sin ω
1+j ·
1 + k2 cos ω − k1
whose magnitude response is described by
1
|H(ejω )| = ) *2

1 − k2 sin ω (2.97)
1+ ·
1 + k2 cos ω − k1
The notch frequency ωo , cutoff frequencies ω1 , ω2 (ω1 < ω2 ) and bandwidth

B = ω2 − ω1 of the magnitude response in (2.97) are illustrated in Figure 2.17.
Evidently, since k1 = cos ωo , the notch frequency is given by
ω = ωo (2.98)
√
Also, the cutoff frequencies ωi for i = 1, 2 must satisfy |H(ejωi )| = 1/ 2.
Hence,
) *2
1 − k2 sin ωi
· = 1 for i = 1, 2 (2.99a)
1 + k2 cos ωi − cos ωo
j
|H(e )|
1
1
2
0 1 0 2
Figure 2.17 Notch frequency ωo , cutoff frequencies ω1 , ω2 and bandwidth B = ω2 − ω1 of
the magnitude response in (2.97).
or equivalently,
) *2 ) *2
sin ωi 1 + k2
= for i = 1, 2 (2.99b)
cos ωi − cos ωo 1 − k2
must hold. By defining A = (1 + k2 )/(1 − k2 ), it can be derived from (2.99b)

that
sin ω1 sin ω2
= A, = −A (2.100)
cos ω1 − cos ωo cos ω2 − cos ωo
where A > 0. By solving (2.100), we obtain
sin ω1 + sin ω2 1 1
A= = ω2 − ω1 =
cos ω1 − cos ω2 tan tan B2 (2.101)
2
From (2.101) and A = (1 + k2 )/(1 − k2 ), it follows that
B 1 1 − k2 1 − tan B2
tan = = ⇐⇒ k2 = (2.102)
2 A 1 + k2 1 + tan B2
Hence, if k2 approaches unity, then we have
B B
tan and B 1 − k2 (2.103)
2 2
This reveals that if k2 1, then bandwidth B approaches zero, and (2.95)
approaches an ideal notch filter. For the sake of stability, |k1 | < 1 and |k2 | < 1
have been assumed in the above analysis.
There are some variants of a single-frequency notch filter in (2.95). By
employing an arithmetic-geometric mean inequality, we obtain
1 + k2 +
≥ k2 (2.104)
2
where equality holds provided that k2 = 1. Hence, if we assume k2 1 and
set k2 = ρ2 where 0 ρ < 1, then it follows from (2.95) that [12, 13]
1 − 2k1 z −1 + z −2
H(z) ρ
1 − 2k1 ρz −1 + ρ2 z −2
(2.105)
1 − 2k1 z −1 + z −2
1 − 2k1 ρz −1 + ρ2 z −2
or [14]
1 − 2k1 z −1 + ρ(2 − ρ)z −2
H(z) ρ (2.106)
1 − 2k1 ρz −1 + ρ2 z −2
The magnitude response of the notch digital filter in (2.106) is shown in
Figure 2.18 where ωo = 0.3π and ρ = 0.985.
The notch digital filter in (2.106) can be realized by a state-space model
as [14]
x(k + 1) = Ax(k) + bu(k)
(2.107)
where x(k) is a 2 × 1 state-variable vector, u(k) is a scalar input, y(k) is a
scalar output, and A, b, c and d are real constant matrices defined by
cos ωo − sin ωo
A=ρ , c=ρ 1 1
sin ωo cos ωo
cos ωo − sin ωo
b = (ρ − 1) , d=ρ
cos ωo + sin ωo
From (2.107), it is observed that AAT = AT A holds, i.e., matrix A is
normal [15]. Without loss of generality, the coefficient matrices A and b of
the state-space model in (2.107) can be written as [14]
ρ α −β ρ−1 α−β
A= + , b= +
α2 + β 2 β α α2 + β 2 α+β
(2.108)
where α and β are real numbers.
jω
|H(e )|
1
0 0.5π π ω
Figure 2.18 Magnitude response of a notch filter in (2.106) where ωo = 0.3π and ρ = 0.985.
2.9.5 Doubly Complementary Digital Filters

All-pass digital filters play an important role in doubly complementary filters
which find applications in various signal processing systems [8]. In this
section, two stable transfer functions G(z) and H(z) which are both all-pass
complementary, i.e.,

G(ejω ) + H(ejω ) = 1 for all ω (2.109)
and power complementary, i.e.,

G(ejω )2 + H(ejω )2 = 1 for all ω (2.110)
are considered. Such transfer function pairs are termed doubly complemen-
tary [16].
We now examine a parallel structure of all-pass digital filters A1 (z) and
A2 (z), shown in Figure 2.19.
The frequency responses of transfer functions G(z) = Y1 (z)/U (z) and
H(z) = Y2 (z)/U (z) specified in Figure 2.19 can be expressed as
1
G(ejω ) = A1 (ejω ) + A2 (ejω )
2
1 jθ1 (ω)
= e + ejθ2 (ω) (2.111)
2
θ1 (ω)+θ2 (ω) θ1 (ω) − θ2 (ω)
= ej 2 cos
2
1
2
A1 (z) Y1 (z)
U(z) 1
2
A2 (z) - Y2 (z)
Figure 2.19 Implementation of the doubly complementary filter pair as the sum and difference
of all-pass digital filters.
1
H(ejω ) = A1 (ejω ) − A2 (ejω )
2
1
= ejθ1 (ω) − ejθ2 (ω) (2.112)
2
θ1 (ω)+θ2 (ω) θ1 (ω) − θ2 (ω)
= j ej 2 sin
2
respectively. Evidently, the transfer function G(z) in (2.111) is a lowpass filter
and the transfer function H(z) in (2.112) is a highpass filter provided that

θ2 (ω), for 0 ≤ |ω| ≤ ωp
θ1 (ω) = (2.113)
θ2 (ω) ± π, for ωs ≤ |ω| ≤ π
where ωp and ωs denote the passband and stopband edges, respectively. It

is straightforward to show that the transfer functions G(z) = Y1 (z)/U (z)
and H(z) = Y2 (z)/U (z) satisfy both Equations (2.109) and (2.110)
simultaneously. Hence, they are doubly complementary.
When designing a lowpass (highpass) digital filter using all-pass digital
filters A1 (z) and A2 (z), it is a good choice to use delay elements z −(n−1) as
an all-pass filter A1 (z), i.e., A1 (z) = z −(n−1) . The phase specification of the
second all-pass filter A2 (z) is given by

−(n − 1)ω, for 0 ≤ |ω| ≤ ωp
θdesired (ω) = (2.114)
−(n − 1)ω − π, for ωs ≤ |ω| ≤ π
The phase specification of an all-pass digital filter A2 (z) is depicted together

with the magnitude response in Figure 2.20.
Efficient techniques are available to design an all-pass digital filter A2 (z)
approximating a given phase response. The reader is referred to Section 7.3
for further details.
2.10 Summary
This chapter has covered fundamental concepts relating to discrete-time
signals and systems. First, the concepts of a discrete-time sequence, the
z-transformation, and z-transform inversion have been introduced. The prop-
erties of discrete-time systems have then been discussed, and discrete-time
systems have been described in terms of difference equations as well as state-
space descriptions. Frequency responses have also been induced from the
References 55
Magnitude
1
0 ωp ωs π ω
Phase
−(n−1)π
−nπ
Figure 2.20 The phase specification of an all-pass digital filter A2 (z).
transfer functions. In addition, analysis and design issues of several useful

types of IIR digital filters, including all-pass digital filters, notch digital filters
and doubly complementary digital filters, are examined.
References
[2] Y. Sakawa, Linear System Control Theory, Tokyo, Japan, Asakura
Publishing, 1979.
[3] H. Kogo and T. Mita, Introduction to System Control Theory, Tokyo,
Japan, Jikkyo Shuppan, 1979.
Shokodo, 1986.
[5] M. S. Santina, A. R. Stubberud and G. H. Hostetter, Digital Control
System Design, 2nd ed. Orlando, FL, Saunders College Publishing,
Harcourt Brace College Publishers, 1994.
[6] S. Takahashi and M. Ikehara, Digital Filters, Tokyo, Japan, Baifukan,
1999.
[7] M. Hagiwara, Digital Signal Processing, Tokyo, Japan, Morikita Pub-
lishing, 2001.
[8] P. A. Regalia, S. K. Mitra and P. P. Vaidynathan, “The digital all-pass
filter: A versatile signal processing building block,” Proc. IEEE, vol. 76,
no. 1, pp. 19–37, Jan. 1988.
[9] A. H. Gray, Jr., and J. D. Markel, “Digital lattice and ladder filter synthe-
sis,” IEEE Trans. Audio Electroacoust., vol. AU-21, no. 6, pp. 491–500,
Dec. 1973.
[10] A. H. Gray, Jr., and J. D. Markel, “A normalized filter structure,”
IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-23, no. 3,
pp. 268–270, June 1975.
[11] A. H. Gray, Jr., “Passive cascaded lattice digital filters,” IEEE Trans.
Circuits Syst., vol. CAS-27, no. 5, pp. 337–344, May 1980.
[12] T. S. Ng, “Some aspects of an adaptive digital notch filter with con-
strained poles and zeros,” IEEE Trans. Acoust., speech, Signal Process.,
vol. ASSP-35, no. 2, pp. 158-161, Feb. 1987.
[13] M. V. Dragosevic and S. S. Stankovic, “An adaptive notch filter with
improved tracking properties,” IEEE Trans. Signal Process., vol. 43,
no. 9, pp. 2068–2078, Sep. 1995.
[14] Y. Hinamoto and S. Nishimura, “Normal-form state-space realization of
single frequency IIR notch filters and its application to adaptive notch
filters,” in Proc. APCCAS 2016, pp. 599–602, Oct. 2016.
[15] R. E. Skelton and D. A. Wagie, “Minimal root sensitivity in linear
systems,” J. Guidance Contr., vol. 7, pp. 570–574, Sep.–Oct. 1984.
[16] P. P. Vaidynathan, S. K. Mitra and Y. Neuvo, “A new method of
low sensitivity filter realization,” IEEE Trans. Acoust., Speech, Signal
Process., vol. ASSP-34, no. 2, pp. 350–361, Apr. 1986.
3
Stability and Coefficient Sensitivity
3.1 Preview
A dynamic system can hardly be useful in practice unless it is stable, meaning
that it always produces a reasonable output in response to a reasonable
input. For the sake of quantitative analysis of digital filters and algorithmic
development of their designs, it is of fundamental importance to understand the
notion of stability in rigorous terms and various criteria for verifying stability.
In Section 3.2, after defining the bounded-input bounded-output stability for
IIR digital filters, several stability criteria in terms of impulse response and
system poles are presented. These include Schur-Cohn criterion, Schur-Cohn-
Fujiwara criterion, Jury-Marden criterion, and Lyapunov criterion. Another
issue addressed in this chapter is coefficient sensitivity. When the coefficients
of an IIR digital filter are quantized for implementation purposes, the coeffi-
cient error may lead to substantial changes in the filter characteristics such as
frequency response and stability. In Section 3.3, these changes are evaluated
by examining the changes in the locations of poles due to the changes in the
filter’s coefficients.
3.2 Stability
3.2.1 Definition
An IIR digital filter is said to be stable if every bounded input produces a
bounded output. A necessary and sufficient condition on the impulse response
for stability is given in the following theorem.
Theorem 3.1
A necessary and sufficient condition for an IIR digital filter to be stable is that
its impulse response {h(n)} is absolutely summable, namely,
57
58 Stability and Coefficient Sensitivity
∞

|h(n)| < ∞ (3.1)
n=−∞
Proof
Assume that (3.1) does not hold, i.e., the sum on the left-hand side of (3.1) is
not bounded. Consider the sequence x(n) defined by

1 if h(−n) ≥ 0
x(n) =
−1 if h(−n) < 0
which is obviously a bounded sequence. With this {x(n)} as an input, the

output of the filter at n = 0 is given by
∞
∞

y(0) = x(m)h(−m) = |h(m)| = ∞
m=−∞ m=−∞
which shows that (3.1) is a necessary condition. To prove the sufficiency,

assume that (3.1) holds and {x(n)} is bounded. Then we have

∞

|y(n)| = x(m)h(n − m)

m=−∞
∞ ∞

≤ |x(m)| |h(n − m)| ≤ M |h(n − m)| < ∞
m=−∞ m=−∞
where M is a bound of x(n). Hence (3.1) is also a sufficient condition and the
proof is complete.
3.2.2 Stability in Terms of Poles

For a causal IIR digital filters whose transfer function is a rational function of
the form
M
ai z −i
i=0
H(z) = N with b0 = 1 (3.2)
−i
bi z
i=0
its stability can be characterized in terms its poles which are defined as the

N
roots {pi | i = 1, 2, · · · , N } of the equation bi z −i = 0 in the z plane.
i=0
3.2 Stability 59
Theorem 3.2
The IIR filter in (3.2) is stable if and only if all its poles are located strictly
inside the unit circle of the z plane, namely,
|pi | < 1 for i = 1, 2, · · · , N (3.3)
Proof
We begin by writing the transfer function in terms of its impulse response as
∞

H(z) = h(n)z −n
n=−∞
Because the filter is causal, h(n) vanishes for every n < 0, hence we have
∞

H(z) = h(n)z −n
n=0
If the filter is stable, then (3.1) implies that for any z = rejω with r ≥ 1 we
have
∞ ∞ ∞

H(rejω ) ≤ h(n)r−n e−jnω = −n
r |h(n)| ≤ |h(n)| < ∞
n=0 n=0 n=0
which implies that any pole of H(z) cannot be in the region {z: |z| ≥ 1}, i.e.
(3.3) must hold. Next, assume that (3.3) is satisfied. For n > 0, the impulse
response h(n) can be expressed as
N

1
h(n) = H(z)z n−1 dz = pn−1
i resz=pi H(z)
2πj Γ i=1
where the residue resz=pi H(z) is known to be finite, hence |resz=pi

H(z)| ≤ R̄ for some constant R̄ and pi = ri ejψi with ri ≤ r̄ for some r̄ < 1.
It follows that
∞
∞

|h(n)| ≤ |h(0)| + N R̄ r̄n−1 < ∞
n=0 n=1
hence the filter is stable which completes the proof.

3.2.3 Schur-Cohn Criterion

Given an N th-order polynomial B(z) of the form
N

B(z) = bi z N −i (3.4)
i=0
the Schur-Cohn criterion [1] examines whether or not all its zeros are inside
the unit circle of the z plane by evaluating the determinants of N matrices
whose sizes vary from 2 × 2 to 2N × 2N .
Theorem 3.3: Schur-Cohn

All zeros of polynomial B(z) are strictly inside the unit circle of the z plane
if and only if, for k = 1, 2, · · · , N ,
det S k < 0 if k is odd and det S k > 0 if k is even

where
Ak Bk
Sk =
BTk ATk
⎡ ⎤
bN 0 0 ··· 0
⎢ b bN 0 ··· 0 ⎥
⎢ N −1 ⎥
Ak = ⎢
⎢ .. .. .. ..
⎥
⎥
⎣ . . . . ⎦
bN −k+1 bN −k+2 bN −k+3 · · · bN
⎡ ⎤
b0 b1 b2 · · · bk−1
⎢ 0 b b · · · bk−2 ⎥
⎢ 0 1 ⎥
Bk = ⎢ ⎢ .. . . ..
⎥
⎥
⎣ . .. .. . ⎦
0 0 0 · · · b0
3.2.4 Schur-Cohn-Fujiwara Criterion

A stability criterion based on the Schur-Cohn criterion with improved effi-
ciency was described by Fujiwara [1].
Theorem 3.4: Schur-Cohn-Fujiwara

if and only if the matrix F = (fij ) of size N × N is positive definite, where
3.2 Stability 61
min(i,j)

fij = bi−k bj−k − bN −i+k bN −j+k (3.5)
k=1
3.2.5 Jury-Marden Criterion

An efficient and easy-to-use stability criterion was developed by Jury [1] using
a result of Marden [1] that considerably simplifies the calculations involved
in Schur-Cohn criterion. In this criterion, an array of numbers known as the
Jury-Marden array is constructed as follows: The first two rows of the array are
just the coefficients of polynomial B(z) in ascending and descending orders,
respectively, see Table 3.1.
The elements of the third and fourth rows are calculated as

b0 bN −i

ci = for i = 0, 1, · · · , N − 1
bN bi
and those of fifth and sixth rows as

c0 cN −1−i

di = for i = 0, 1, · · · , N − 2
cN −1 ci
and so on until a total of 2N − 3 rows are computed. There will be three

components in the last row, which are denoted as r0 , r1 , and r2 . The criterion
can now be stated as
Theorem 3.5: Jury-Marden

if and only if the following conditions are satisfied:
Table 3.1 The Jury-Marden array [1]

Row Coefficients
1 b0 b1 ··· bN
2 bN bN −1 · · · b0
3 c0 c1 ··· cN −1
4 cN −1 cN −2 · · · c0
5 d0 d1 ··· dN −2
6 dN −2 dN −3 · · · d0
··· ············
2N − 3 r0 r1 r2
(i) D(1) > 0

(ii) (−1)N D(−1) > 0
(iii) b0 > |bN |
|c0 | > |cN −1 |
|d0 | > |dN −2 |
..
.
|r0 | > |r2 |
3.2.6 Stability Triangle of Second-Order Polynomials

By applying the Jury-Marden criterion to a second-order polynomial of the
form
B(z) = z 2 + b1 z + b2 (3.6)
it can be concluded that B(z) is stable if and only if coefficients {b1 , b2 }
satisfy
b2 < 1, b1 + b2 > −1, and b1 − b2 < 1 (3.7)
The region defined by the three linear constraints in (3.7) is depicted in Figure
3.1, which is often referred to as the stability triangle [2, 3].
3.2.7 Lyapunov Criterion

An IIR digital filter characterized in state space as
x(k + 1) = Ax(k) + bu(k)

(3.8)
b2
1
−2 2 b1
−1
Figure 3.1 Stability triangle.

3.2 Stability 63
is stable if and only if the magnitudes of the eigenvalues of system matrix A

are strictly less than unity.
Theorem 3.6: Lyapunov

The state-space digital filter in (3.8) is stable if and only if for a positive
definite matrix Q (say, Q = I ) there exists a unique positive definite matrix P
that satisfies the Lyapunov equation
AT PA − P = −Q (3.9)
Proof
Suppose there exists a positive definite P that satisfies the equation in (3.9). Let
λ be any one of the eigenvalues of A. By definition, there exists an eigenvector
x = 0 such that Ax = λx. Multiplying (3.9) by x from right and xT from
left, we obtain xT AT P Ax − xT P x = −xT Qx which implies that
(|λ|2 − 1)xT P x = −xT Qx (3.10)
Hence
xT Qx
|λ|2 − 1 = − <0 (3.11)
xT P x
because both xT P x and xT Qx are strictly positive. This shows that the
magnitudes of the eigenvalues of A are strictly less than unity, therefore the
system is stable.
Conversely, suppose the system is stable. For a given positive definite Q,
we can construct a matrix P as
∞

P = (AT )k QAk (3.12)
k=0
Because the system is stable, the magnitudes of all eigenvalues of A are

strictly less than unity, hence the above series is well defined. Moreover, P is
positive definite because it is the sum of a positive define matrix Q plus many
other terms each of which is at least positive semidefinite. Furthermore, we
can compute
∞
∞

AT P A − P = (AT )k+1 QAk+1 − (AT )k QAk = −Q (3.13)
k=0 k=0
Thus the positive definite matrix P constructed above satisfies (3.9), which
completes the proof.
3.3 Coefficient Sensitivity

The difference equation associated with the IIR digital filter in (3.2) is given
by
M
N
y(k) = ai x(k − i) − bi y(k − i) (3.14)
i=0 i=1
When the filter coefficients {ai } and {bi } are quantized for implementation
purposes, the coefficient error may lead to substantial changes in the filter
characteristics such as frequency response and stability. These changes can
be investigated by examining the changes in the locations of poles due to the
changes in the filter’s coefficients.
The transfer function in (3.2) can be expressed as

z N −M a0 z M + a1 z M −1 + · · · + aM Ã(z)
H(z) = = (3.15)
z N + b1 z N −1 + · · · + bN B̃(z)
The polynomials Ã(z) and B̃(z) in (3.15) can be written as

N

N −M
M M −1

Ã(z) = z a0 z + a1 z + · · · + aM = a0 (z − zi ) (3.16)
i=1
and
N
N

B̃(z) = bk z N −k = (z − pi ) (3.17)
k=0 i=1
where {zi } and {pi } are the zeros and poles of the IIR filter, respectively.
To examine how changes in coefficients {bk } lead to change in pole pi , we
assume that the N poles are distinct and denote by dpi the infinitesimal change
in pi due to infinitesimal changes db1 , db2 , . . . , dbn in {bk }. The rule of total
differentiation gives
n
∂pi
dpi = dbk
∂bk z=pi
k=1
where
∂ B̃(z) ∂bk
∂pi pN
i
−k
= =−
∂bk z=pi ∂ B̃(z) ∂pi

N
z=pi (pi − pj )
j=1, j=i
3.4 Summary 65
Therefore, we have
n

1
dpi = − pN
i
−k
dbk (3.18)

N
(pi − pj ) k=1
j=1, j=i
Based on (3.18), several observations can be made as follows [3]:

1. Coefficient sensitivity increases when the poles are close together.
2. Coefficient sensitivity increases when a pole pi moves closer to the unit
circle because the magnitude of each pN i
−k
gets larger.
3. The filter is most sensitive to the variations of the last coefficient bN
because in (3.18) it is associated with p0i = 1 that is the largest among all
pNi
−k
. In general, the filter’s sensitivity to the variations of coefficient bi
is greater than that of coefficient bj as long as i > j.
For a high-order IIR filter with sharp transitions in its frequency response,
typically the poles are not well-separated, hence the coefficient sensitivity
of the filter can be high. A less sensitive realization of a high-order filter
is to break the transfer function into lower-order sections and connect these
section in cascade or parallel. In this way, the poles within each section can
be well-separated and reduced overall coefficient sensitivity can be achieved.
Although the use of fourth- or higher-order sections may be justified in some
applications, second-order section has been the most-often employed building
block in cascade and parallel structures.
3.4 Summary
This chapter presents several criteria that can be used to verify the stability of an
IIR digital filters. The Schur-Cohn criterion, Schur-Cohn-Fujiwara criterion,
and Jury-Marden criterion are of use when the transfer function of the IIR filter
is given. The stability triangle defined by (3.7) is particularly convenient when
the denominator of the IIR filter is given in terms of a product of second-order
sections. The Lyapunov criterion is suitable for IIR filters that are modeled in
state space. This chapter also presents a brief study on coefficient sensitivity.
This is done by examining the changes in the locations of poles due to the
changes in the filter’s coefficients.
References
[1] E. I. Jury, Theory and Application of the z-Transform Method,
John Wiley, New York, 1964.
[2] A. Antoniou, Digital Filters: Analysis, Design, and Applications, 2nd ed.,
McGraw-Hill, New York, 1993.
[3] R. A. Roberts and C. T. Mullis, Digital Signal Processing, Addison-
Wesley, Reading MA, 1987.
4
State-Space Models
4.1 Preview
We have introduced the external (input-output) description and state-space
description of dynamical linear systems in Chapter 2. Unlike the external
description, by adequately defining a state-variable vector and establishing
dynamical relations between the state variables and system’s input and
output, a state-space description reveals a great deal of internal structure
of a dynamical linear system. In this chapter, the state-space models of
dynamical linear systems are studied more systematically. In Section 4.2,
a necessary and sufficient condition for a state-space model to be controllable
and observable is described and proved. In Section 4.3, Faddeev’s formula
for deriving its transfer function from a given state-space model is presented.
In Section 4.5, equivalent transformation is defined with its application to
the derivation of various representations such as canonical form, balanced
form, input-normal form, output-normal form and so on. In Section 4.6,
Kalman’s canonical structure theorem is introduced. In Section 4.6, the
problems of minimal realization and minimal partial realization are addressed
by means of the Hankel matrix. Finally, a passivity property of discrete-
time linear systems, known as lossless bounded-real lemma, is examined in
Section 4.7.
4.2 Controllability and Observability

Consider a state-space model (A, b, c, d)n described by
x(k + 1) = Ax(k) + bu(k)
(4.1)
where x(k) is an n × 1 state-variable vector, u(k) is a scalar input, y(k)
is a scalar output, and A, b, c and d are n × n, n × 1, 1 × n, and 1 × 1
67
68 State-Space Models
real constant matrices, respectively. The state space of the model in (4.1)
is an n-dimensional real vector space and is denoted by Σ. The first and
second equations in (4.1) are called the state equation and output equation,
respectively. A block-diagram of the state-space model in (4.1) is depicted in
Figure 4.1.
From (4.1), it follows that
x(k) = Ak x(0) + Ak−1 bu(0) + · · · + Abu(k − 2) + bu(k − 1)

y(k) = cAk x(0) + cAk−1 bu(0) + · · · + cAbu(k − 2) (4.2)
+ cbu(k − 1)+du(k)
Definition 4.1
The state equation in (4.1) is said to be controllable if for any initial state x(0)
in the state space Σ, and any state xd in Σ, there exist a finite step ko > 0
and an input sequence that will transfer the initial state x(0) to the state xd at
step ko . Otherwise, the state equation is said to be uncontrollable.
Definition 4.2
The state-space model (A, b, c, d)n in (4.1) is said to be observable if for any
initial state x(0) in the state space Σ, there exist a finite step ko > 0 such
that the knowledge of the input and output over the finite interval 0 ≤ k ≤ ko
suffices to determine the initial state x(0) uniquely. Otherwise, the state-space
model (A, b, c, d)n is said to be unobservable.
Definition 4.3
The state-space model (A, b, c, d)n in (4.1) is said to be minimal if it is
controllable and observable.
Figure 4.1 A state-space model.

4.2 Controllability and Observability 69

⎡ ⎤
u(n − 1)
⎢ ⎥
⎢u(n − 2)⎥
x(n) − An x(0) = b Ab · · · An−1 b ⎢ .. ⎥
⎣ . ⎦
u(0)
⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎡ ⎤
y(0) c d 0 ··· 0 u(0)
⎢ y(1) ⎥ ⎢ cA ⎥ ⎢ .. .. ⎥ ⎢ u(1) ⎥
⎢ ⎥ ⎢ ⎥ ⎢ cb d .⎥ ⎥
⎢ ⎥ = ⎢ .. ⎥ x(0) + ⎢
. ⎥⎢⎢ ⎥
⎣
..
⎦ ⎣ ⎦ ⎢ . . . ⎥ ⎣
..
⎦
. . ⎣ .
. . . . . 0⎦ .
n−1
y(n − 1) cA cAn−2 b · · · cb d u(n − 1)
(4.3)
By using (4.3) and applying the Cayley-Hamilton theorem described in Section
4.3.3, we can readily obtain the following two theorems.
Theorem 4.1
The state equation in (4.1) is controllable if and only if

rank b Ab · · · An−1 b = n (4.4)
Theorem 4.2
The state-space model (A, b, c, d)n in (4.1) is observable if and only if
⎡ ⎤
c
⎢ cA ⎥
⎢ ⎥
rank ⎢ .. ⎥ = n (4.5)
⎣ . ⎦
cAn−1
T
The matrices V n = b Ab · · · An−1 b and U n = cT (cA)T · · · (cAn−1 )T
are called the controllability matrix and observability matrix, respectively.
Under the assumption that Ak converges to 0 as k goes to infinity (i.e., the
state-space model (A, b, c, d)n is stable), we define two symmetric matrices
K c and W o as
∞
Kc = Ak bbT (Ak )T
k=0
∞
(4.6)
k T T k
Wo = (A ) c cA
k=0
These marices K c and W o can be obtained by solving the Lyapunov equations
K c = AK c AT + bbT
(4.7)
W o = AT W o A + cT c
Marices K c and W o are called the controllability Grammian and the observ-
ability Grammian, respectively. It can readily be shown that if the state-space
model in (4.1) is controllable and observable, then K c and W o are positive
definite.
4.3 Transfer Function

4.3.1 Impulse Response
z[X(z) − x(0)] = AX(z) + b U (z)

(4.8)
Y (z) = cX(z) + d U (z)
where X(z), U (z) and Y (z) denote the z-transforms of state-variable vector
x(k), input u(k) and output y(k), respectively. From (4.8), we have
X(z) = (zI n − A)−1 zx(0) + (zI n − A)−1 b U (z)

(4.9)
Y (z) = c(zI n − A)−1 zx(0) + c(zI n − A)−1 b + d U (z)
The first and second terms on the right-hand side of (4.9) represent the transient
response and steady state characteristic, respectively. By setting x(0) = 0,
the transfer function H(z) of the state-space model (A, b, c, d)n in (4.1) is
obtained as
Y (z)
H(z) = = c(zI n − A)−1 b + d (4.10)
U (z)
By noting that
(zI n − A)−1 = Iz −1 + Az −2 + A2 z −3 + · · ·
∞ (4.11)
= Ai−1 z −i
i=1
4.3 Transfer Function 71
we can write (4.10) as

∞
H(z) = d + cAi−1 b z −i
i=1
∞
(4.12)
= hi z −i
i=0
where
h0 = d, hi = cAi−1 b for i = 1, 2, 3, · · ·
Sequence {h0 , h1 , h2 , · · · } is called the unit-pulse response or impulse
response of the state-space model (A, b, c, d)n in (4.1).
4.3.2 Faddeev’s Formula

To compute the transfer function in (4.10), we express the inverse of
(zI n − A) as
adj(zI n − A)
(zI n − A)−1 = (4.13a)
det(zI n − A)
where det(zI n − A) and adj(zI n − A) stand for the determinant and adjoint
matrix of (zI n − A), respectively, and the denominator and numerator in
(4.13a) are given by
det(zI n − A) = z n + a1 z n−1 + · · · + an−1 z + an
(4.13b)
adj(zI n − A) = B 0 z n−1 + B 1 z n−2 + · · · + B n−2 z + B n−1
respectively. The scalars a1 , a2 , · · · , an and the matrices B 0 , B 1 , · · · , B n−1
in (4.13b) can be computed from
B0 = I n, B i = AB i−1 + ai I n (4.14a)
1
ai = − tr[AB i−1 ] for i = 1, 2, · · · , n (4.14b)
i
where B n = 0 that can be used to check whether the program is correct or
not. The equations in (4.14) are known as Faddeev’s formula.
Proof
From (4.13a), it follows that
(zI n − A) adj(zI n − A) = det(zI n − A)I n (4.15)
Substituting (4.13b) into (4.15) provides

n−1 n
n−1−i
(zI n − A) Biz = ai z n−i I n , a0 = 1 (4.16)
i=0 i=0
By making a comparison of coefficients between both sides of (4.16), we

obtain
B0 = I n
B i = AB i−1 + ai I n for i = 1, 2, · · · , n − 1 (4.17)
0 = AB n−1 + an I n
which is the same as (4.14a). Moreover, it can easily be verified that
n n
∂ det(zI n − A) ∂ det(zI n − A) ∂ (zI n − A)ij
= ·
∂z ∂ (zI n − A)ij ∂z
i=1 j=1
n n
(4.18)
= adj(zI n − A) ji · I n ij
i=1 j=1

= tr adj(zI n − A)
where (A)ij denotes the (i, j)th element of matrix A. By substituting (4.13b)
into (4.18), we obtain
nz n−1 + (n − 1)a1 z n−2 + · · · + an−1
(4.19)
= (trB 0 )z n−1 + (trB 1 )z n−2 + · · · + trB n−1
By making a comparison of coefficients between both sides of (4.19), we have
1
ai = trB i for i = 1, 2, · · · , n − 1 (4.20)
n−i
Finally, substituting B i in (4.17) into (4.20) gives
1
ai = tr[AB i−1 + ai I n ]
n−i
(4.21)
1 n
= tr[AB i−1 ] + ai
n−i n−i
which readily yields (4.14b). This completes the proof of Faddeev’s
formula.
4.4 Equivalent Systems 73
4.3.3 Cayley-Hamilton’s Theorem

By setting the denominator polynomial of (4.13b) to zero, we obtain
det(zIn − A) = z n + a1 z n−1 + · · · + an = 0 (4.22)
which is called the characteristic equation. From (4.14a), it follows that

B 1 = AB 0 + a1 I n = A + a1 I n
B 2 = AB 1 + a2 I n = A2 + a1 A + a2 I n
(4.23)
..
.
B n = AB n−1 + an I n = An + a1 An−1 + · · · + an I n
Since B n = 0, we readily obtain
An + a1 An−1 + · · · + an I n = 0 (4.24)
which is known as the Cayley-Hamilton theorem.

It is noted that the values of z satisfying (4.22) coincide with the eigenval-
ues of matrix A, and they are called the poles of the transfer function H(z).
4.4 Equivalent Systems

4.4.1 Equivalent Transformation
In this section, we introduce the concept of equivalent linear time-invariant
causal dynamical systems. This concept is found to be useful in constructing
canonical forms as well as developing input-normal, output-normal, and
balanced state-space descriptions.
Definition 4.4
Let T be an n × n nonsingular real matrix, and let x(k) = T −1 x(k). Then
the state-space model (A, b, c, d)n described by
x(k + 1) = Ax(k) + bu(k)

(4.25)
y(k) = c x(k) + du(k)
is said to be equivalent to the state-space model (A, b, c, d)n in (4.1) and T is

called the equivalent transformation matrix, similarity transformation matrix
or coordinate transformation matrix where
A = T −1 AT , b = T −1 b, c = cT

z X(z) − x(0) = A X(z) + b U (z)
(4.26)
Y (z) = cX(z) + d U (z)
where X(z) is the z-transform of state-vaiable vector x(k). By setting x(0) =

0, the transfer function H(z) of the state-space model (A, b, c, d)n in (4.25)
can be written as
Y (z)
H(z) = = c(zI n − A)−1 b + d
U (z)
= c(zI n − A)−1 b + d (4.27)
= H(z)
In other words, the transfer function of a state-space model remains invariant

under equivalent transformation. That is, H(z) = H(z) holds true for any
equivalent transformation defined by x(k) = T −1 x(k).
4.4.2 Canonical Forms

1. Controllable Canonical Form
If the state-space model (A, b, c, d)n in (4.1) is controllable, then by
applying an equivalent transformation x(k) = T −1 c x(k) with T c =
[b Ab · · · An−1 b], the model can be transformed into the following
controllable canonical form:
⎡ ⎤ ⎡ ⎤
0 · · · 0 −an 1
⎢ . . .. .. ⎥ ⎢ ⎥
⎢1 . . . ⎥ ⎢0⎥
x(k + 1) = ⎢ . . ⎥ x(k) + ⎢ .. ⎥ u(k)
⎣ .. . . 0 −a ⎦ ⎣.⎦ (4.28)
2
0 · · · 1 −a1 0

y(k) = h1 h2 · · · hn x(k) + du(k)
By writing the Cayley-Hamilton theorem in (4.24) as
An b = −an b − an−1 Ab − · · · − a1 An−1 b (4.29)

Equation (4.28) can easily be verified by

AT c = Ab A2 b · · · An b
⎡ ⎤
0 · · · 0 −an
⎢ . . .. ⎥
⎢ 1 . . .. . ⎥
= Tc ⎢ . . ⎥
⎣ .. . . 0 −a ⎦
2
0 · · · 1 −a1 (4.30)
⎡ ⎤
1
⎢0⎥
⎢ ⎥
b = T c ⎢ .. ⎥ , cT c = h1 h2 · · · hn
⎣.⎦
0
where hi = cAi−1 b for i ≥ 1 in (4.12).

Moreover, if an equivalent transformation x(k) = T −1 x(k) with
⎡ ⎤
an−1 · · · a1 1
⎢ .. ..
. ⎥
⎢ . 1 0⎥
T = Tc ⎢ . .⎥ (4.31)
⎣ a . . . . .. ⎦
.
1
1 0 ··· 0
is applied to the state-space model (A, b, c, d)n in (4.1), the model can be
transformed into the following controllable canonical form:
⎡ ⎤ ⎡ ⎤
0 1 ··· 0 0
⎢ .. . . .. ⎥ ⎢ .. ⎥
⎢ . . . . . ⎥ ⎢ ⎥
x(k + 1) = ⎢ . ⎥ x(k) + ⎢ . ⎥ u(k)
⎣ 0 ··· 0 1 ⎦ ⎣0⎦
−an · · · −a2 −a1 1
n n−1

y(k) = an−i hi an−1−i hi · · · h1 x(k) + du(k)
i=1 i=1
(4.32)
This can easily be shown by noticing the following:

⎡ ⎤
⎡ ⎤ −an 0 ··· 0 0
an−1 · · · a1 1
⎢ 0 a n−2 · · · a1 1⎥
⎢ .. ..
. ⎥ ⎢ ⎥
−1 ⎢ . 1 0 ⎥ ⎢ .. .. . ⎥
T c AT c ⎢ . ⎥=⎢ . . .. 1 0⎥
⎣ a .
. . . . ..
. ⎦ ⎢ .. ⎥
1 ⎣ 0 a1 . . . ..
. .⎦
1 0 ··· 0
0 1 0 ··· 0
⎡ ⎤⎡ ⎤
an−1 · · · a1 1 0 1 ··· 0
⎢ .. . . ⎥ ⎢ .. . .. .. ⎥
⎢ . . 1 0 ⎥⎢ . .. . . ⎥
=⎢ . ⎥ ⎢ ⎥
⎣ a . . . . .. ⎦ ⎣ 0
. . ··· 0 1 ⎦
1
1 0 ··· 0 −a n ··· −a2 −a1
⎡ ⎤⎡ ⎤ ⎡ ⎤
an−1 · · · a1 1 0 1
⎢ .. . ⎥ ⎢ ⎥ ⎢ ⎥
⎢ . . . 1 0 ⎥ ⎢ .. ⎥ ⎢ 0 ⎥
.
⎢ ⎥ ⎢ ⎥ = ⎢ . ⎥ = T −1 c b
⎣ a . . . . . . ... ⎦ ⎣ 0 ⎦ ⎣ .. ⎦
1
1 0 ··· 0 1 0
(4.33)
Moreover, we have
⎡ ⎤
1
⎢ ⎥ z
⎢ ⎥
adj zI n − T −1 AT T −1 b = ⎢ ⎥ .. (4.34)
⎣ ⎦ .
z n−1
which leads the transfer function to

T
cT 1 z · · · z n−1
H(z) = +d
det zI n − T −1 AT
(4.35)
b1 z n−1 + b2z n−2 + · · · + bn
= +d
z n + a1 z n−1 + · · · + an
where
l
bl = al−i hi with a0 = 1 for l = 1, 2, · · · , n
i=1
2. Observable Canonical Form

If the state-space model (A, b, c, d)n in (4.1) is observable, then by
applying an equivalent transformation x(k) = T o x(k) with T o =
[cT (cA)T · · · (cAn−1 )T ]T , the model can be transformed into the following
observable canonical form:
⎡ ⎤ ⎡ ⎤
0 1 ··· 0 h1
⎢ .. .. .. .. ⎥ ⎢ h2 ⎥
⎢ . . . ⎥ ⎢ ⎥
x(k + 1) = ⎢ . ⎥ x(k) + ⎢ .. ⎥ u(k)
⎣ 0 ··· 0 1 ⎦ ⎣ . ⎦
(4.36)
−an · · · −a2 −a1 hn

y(k) = 1 0 · · · 0 x(k) + du(k)
By writing the Cayley-Hamilton theorem in (4.24) as
cAn = −an c − an−1 cA − · · · − a1 cAn−1 (4.37)
Equation (4.36) can easily be demonstrated by

⎡ ⎤ ⎡ ⎤
cA 0 1 ··· 0
⎢ cA2 ⎥ ⎢ .. .. .. .. ⎥
⎢ ⎥ ⎢ . . . ⎥
T oA = ⎢ . ⎥ = ⎢ . ⎥To
⎣ . ⎦ ⎣ 0
. ··· 0 1 ⎦
cAn −an · · · −a2 −a1
⎡ ⎤ (4.38)
h1
⎢ h2 ⎥
⎢ ⎥
T o b = ⎢ .. ⎥ , c = 1 0 ··· 0 To
⎣ . ⎦
hn
where hi = cAi−1 b for i ≥ 1 in (4.12).

Moreover, if an equivalent transformation x(k) = T x(k) with
⎡ ⎤
an−1 · · · a1 1
⎢ .. ..
. ⎥
⎢ . 1 0⎥
T =⎢ ⎥To (4.39)
⎣ a . . . . . . ... ⎦
1
1 0 ··· 0
is applied to the state-space model (A, b, c, d)n in (4.1), the model can be
transformed into the following observable canonical form:
⎡ n ⎤
⎡ ⎤ ⎢ an−i hi ⎥
0 · · · 0 −an ⎢ i=1 ⎥
⎢ ⎥
⎢ . . .. .. ⎥ ⎢n−1 ⎥
⎢1 . . . ⎥ ⎢ ⎥
x(k + 1) = ⎢ . . ⎥ x(k) + ⎢ an−1−i hi ⎥ u(k)
⎣ .. . . 0 −a ⎦ ⎢ ⎥ (4.40)
2 ⎢ i=1 ⎥
0 · · · 1 −a1 ⎢ .. ⎥
⎣ . ⎦
h1

y(k) = 0 · · · 0 1 x(k) + du(k)
This can easily be verified by noticing the following:

⎡ ⎤
⎡ ⎤ −an 0 ··· 0 0
an−1 · · · a1 1
⎢ 0 an−2 · · · a1 1⎥
⎢ .. ..
. ⎥ ⎢ . ⎥
⎢ . 1 0⎥ −1 ⎢ . .. . ⎥
⎢ . ⎥ T o AT o = ⎢ . . .. 1 0⎥
⎣ a . . . . .. ⎦
. . ⎢ .. ⎥
1 ⎣ 0 a1 . . . ..
. .⎦
1 0 ··· 0
0 1 0 ··· 0
⎡ ⎤⎡ ⎤
0 · · · 0 −an an−1 · · · a1 1
⎢ . . .. ⎥
. ⎢ . . ⎥
⎢ 1 . . .. ⎥ ⎢ .. .. 1 0⎥
=⎢ . . ⎥⎢ . .⎥
⎣ .. . . 0 −a ⎦ ⎣ a . . . . .. ⎦
.
2 1
0 · · · 1 −a1 1 0 ··· 0
⎡ ⎤
an−1 · · · a1 1
⎢ ⎥
. .
⎢ .. .. 1 0⎥
0 ··· 0 1 ⎢ ⎥= 1 0 · · · 0 = cT −1
o
⎣ a . . . . . . ... ⎦
1
1 0 ··· 0
(4.41)
We conclude this section by introducing the concept of duality for two discrete-
time dynamical linear systems. Two systems in (4.32) and (4.40) are said
to be dual, since the controllability of system in (4.32) is equivalent to the
observability of system in (4.40), and the observability of system in (4.32) is
equivalent to the controllability of system in (4.40). Notice that their transfer
functions are identical.
4.4.3 Balanced, Input-Normal, and Output-Normal

State-Space Models
In this section, the state-space model (A, b, c, d)n in (4.1) is assumed to
be stable, controllable and observable. From (4.6), the controllability and
observability Grammians K c and W o for the state-space model (A, b, c, d)n
in (4.25) can be written as
K c = T −1 K c T −T , W o = T T W oT (4.42)
which lead to
K c W o = T −1 K c W o T (4.43)
Therefore, the eigenvalues of K c W o are invariant under the algebraic equiv-
alence. Since matrices K c and W o are symmetric and positive definite, it is
obvious that the eigenvalues of K c W o are all strictly positive. We thus denote
the ith eigenvalue of K c W o by σi2 for i = 1, 2, · · · , n with the ordering
σ1 ≥ σ2 ≥ · · · ≥ σn > 0 and define
Σ = diag{σ1 , σ2 , · · · , σn } (4.44)
Definition 4.5
(1) A state-space model (A, b, c, d)n in (4.25) is said to be balanced if

K c = W o = Σ.
(2) The state-space model (A, b, c, d)n is said to be input-normal if K c = I n
and W o = Σ2 .
(3) The state-space model (A, b, c, d)n is said to be output-normal if
K c = Σ2 and W o = I n .
By applying the Cholesky decomposition to K c , we have
K c = LLT (4.45)
where L is an n × n lower triangular matrix. Let S and Σ be obtained by

eigenvalue-eigenvector decomposition of LT W o L as
LT W o L = SΣ2 S T (4.46)
where Σ2 and S are n × n diagonal and orthogonal matrices composed of the

eigenvalues and eigenvectors of LT W o L, respectively, and S T S = I n .
1. Balanced State-Space Model

Suppose an equivalent transformation x(k) = T −1 x(k) with
1
T = LSΣ− 2 (4.47)
is applied, we obtain
Kc = W o = Σ (4.48)
Hence an equivalent state-space model is balanced, and the balanced state-
space model (A, b, c, d)n is characterized by
1 1
A = Σ 2 S T L−1 ALSΣ− 2
1
(4.49)
T −1 − 12
b=Σ S L 2 b, c = cLSΣ
2. Input-Normal State-Space Model

T = LS (4.50)
is applied, we can derive
K c = I n, W o = Σ2 (4.51)
Hence an equivalent state-space model is input-normal, and the input-normal
state-space model (A, b, c, d)n is specified by
A = S T L−1 ALS
(4.52)
b = S T L−1 b, c = cLS
3. Output-Normal State-Space Model

T = LSΣ−1 (4.53)
is applied, we have
K c = Σ2 , W o = In (4.54)
Hence an equivalent state-space model is output-normal, and the output-
normal state-space model (A, b, c, d)n is found to be
A = ΣS T L−1 ALSΣ−1
(4.55)
b = ΣS T L−1 b, c = cLSΣ−1
4.5 Kalman’s Canonical Structure Theorem 81
The Use of Balanced State-Space Model for Reduced-Order

Approximation of a System
The balanced state-space model characterized by (4.49) is of particular
interest. Suppose a state-space model (A, b, c, d)n is balanced with Σ as
its controllability and observability Grammian and we partition Σ as

Σ1 0
Σ= (4.56)
0 Σ2
where
Σ1 = diag{σ1 , σ2 , · · · , σr }, Σ2 = diag{σr+1 , · · · , σn }
σ1 ≥ σ2 ≥ · · · ≥ σn , σr σr+1 , 0<r<n
the corresponding partition of (A, b, c, d)n becomes

A11 A12 b1
A= , b= , c = c 1 c2 (4.57)
A21 A22 b2
Since the rth-order subsystem (A11 , b1 , c1 , d)r is associated with the r largest
eigenvalues of K c W o , in a sense it is the closest to the original system
(A, b, c, d)n among all rth-order subsystems. Therefore it is natural to take
(A11 , b1 , c1 , d)r to be a r-order approximation of (A, b, c, d)n . We remark
that the reduced-order system (A11 , b1 , c1 , d)r is always stable for all r in
the range [1, n − 1] as long as the original system (A, b, c, d)n is stable. The
interested reader is referred to Section 8.5.3 for details.
4.5 Kalman’s Canonical Structure Theorem

Given a state-space model (A, b, c, d)n in (4.1), consider its transfer function
in (4.12) in terms of the impulse response, namely H(z) = d + cbz −1 +
cAbz −2 + cA2 bz −3 + · · · . Since d involves only a direct path from the input
to the output, it does not affect the procedure described below. Hence, for
simplicity, d is assumed to be zero, and the state-space model (A, b, c, d)n
with d = 0 in (4.1) will be denoted by (A, b, c)n henceforth.
Definition 4.6
Given a state-space model (A, b, c)n , the controllable state-space X c is
defined as
X c = {x ∈ Rn | x ∈ Range [V n ]} (4.58)
where R and Rn denote the set of all real numbersand the set of all ordered

n-tuples of real numbers, respectively, and V n = b Ab · · · An−1 b is the
controllability matrix.
Definition 4.7
Given a state-space model (A, b, c)n , the uncontrollable state-space X u is
defined as
X u = {x ∈ Rn | x ∈ Null [U n ]} (4.59)
T
where U n = cT (cA)T · · · (cAn−1 )T is the observability matrix.
As is shown in (4.4) and (4.5), the system (A, b, c)n is controllable
(observable) if and only if rank V n = n (rank U n = n). From Definitions
4.6 and 4.7, we obtain
X c = Range [V n ], X u = Null[U n ] (4.60)
Lemma 4.1
The controllable state-space X c is invariant under matrix transformation A.
Proof
Using (4.60) and applying the Cayley-Hamilton theorem in (4.24) to the
controllability matrix U n , it follows that
AX c = Range [AV n ] ⊆ Range [V n+1 ] = Range [V n ] = X c (4.61)
Hence, the state-space X c is invariant under matrix transformation A.
Lemma 4.2
The unobservable state-space X u is invariant under matrix transformation A.
Proof
Making use of (4.60) and applying the Cayley-Hamilton theorem in (4.24) to
the observability matrix U n , we have
Null [U n ]A ⊇ Null [U n+1 ] = Null [U n ] = X u (4.62)

4.5 Kalman’s Canonical Structure Theorem 83
Thus, for any x ∈ X u , Ax ∈ X u since U n Ax = 0. Namely, the state-space

X u is invariant under matrix transformation A. This completes the proof of
Lemma 4.2.
In terms of the properties of controllability and observability, or more
precisely, in terms of the subspaces X c and X u , the structure of state
space Rn can be exposed as the direct sum of four subspaces X1 , X2 , X3 ,
and X4 :
X1 = X c ∩ X u , X c = X1 ⊕ X2
(4.63)
Xu = X1 ⊕ X3 , Rn = X1 ⊕ X2 ⊕ X3 ⊕ X4
where dimension [Xi ] = ni for i = 1, 2, 3, 4 and n1 + n2 + n3 + n4 = n.

The theorem below follows on the basis of the above decomposition.
Theorem 4.3: Kalman’s Canonical Structure Theorem

By performing an appropriate equivalent transformation x(k) = T −1 x(k),
the system (A, b, c)n can be transformed into an equivalent system (A, b, c)n
with the canonical structure
⎡ ⎤ ⎡ ⎤
A11 A12 A13 A14 b1
⎢ 0 A 0 A24 ⎥ ⎥ ⎢ ⎥
⎢ 22 ⎢b2 ⎥
A=⎢ ⎥, b=⎢ ⎥
⎣ 0 0 A33 A34 ⎦ ⎣0⎦
(4.64)
0 0 0 A44 0

c = 0 c2 0 c4
where the sizes of submatrices are as follows: Aij is ni ×nj for i, j = 1, 2, 3, 4;

bi is ni × 1 for i = 1, 2 and ci is 1 × ni for i = 2, 4.
Proof
Let T i for i = 1, 2, 3, 4 be an n × ni real matrix whose columns consist of
the basis of the subspace Xi and define an equivalent transformation matrix
T by

T = T1 T2 T3 T4 (4.65)
From Lemmas 4.1 and 4.2, Equations (4.63) and (4.65), and b ∈ X c , it follows
that

AT = AT 1 AT 2 AT 3 AT 4
⎡ ⎤
A11 A12 A13 A14
⎢⎢ 0 A22 0 A24 ⎥ ⎥
= T1 T2 T3 T4 ⎢ ⎥
⎣ 0 0 A33 A34 ⎦
0 0 0 A44 (4.66)
⎡ ⎤
b1
⎢⎢b2 ⎥
⎥
b = T1 T2 T3 T4 ⎢ ⎥
⎣0⎦
0
which yields A = T −1 AT and b = T −1 b in (4.64). By virtue of (4.60),
(4.63) and (4.65), we have

cT = cT 1 cT 2 cT 3 cT 4 = 0 c2 0 c4 (4.67)
which leads to c = cT in (4.64). This completes the proof of Theorem 4.3.
The canonical decomposition is illustrated in Figure 4.2, where the
subsystem Sc o = (A22 , b2 , c2 )n2 is controllable and observable, the sub-
system Sc o = (A11 , b1 , 0)n1 is controllable and unobservable, the subsystem
Sc o = (A33 , 0, 0)n3 is uncontrollable and unobservable, and the subsystem
Sc o = (A44 , 0, c4 )n4 is uncontrollable and observable.
Corollary 4.1
The impulse response as well as the transfer function of the system (A, b, c)n
are the same as those of the subsystem (A22 , b2 , c2 )n2 .
Figure 4.2 Canonical decomposition of a state-space model.

4.6 Hankel Matrix and Realization 85
Proof
i−1
It is evident that cAi−1 b = c A b = c2 Ai−1
22 b2 holds for i = 1, 2, 3, · · ·
and hence H(z) = c(zI n − A) b = c2 (zI n2 − A22 )−1 b2 . This completes
−1
the proof of Corollary 4.1.

It is noted that if a system (A, b, c)n is controllable and observable, this
system is said to be minimal.
4.6 Hankel Matrix and Realization

4.6.1 Minimal Realization
Definition 4.8
Given a sequence of real numbers {hi | i = 1, 2, 3, · · · }, a triple {A, b, c} of
constant matrices is said to be the minimal realization of an input-output map
if hi = cAi−1 b holds for i = 1, 2, 3, · · · , and the size of A is minimal among
all the realizations.
The problem of finding such a realization, if it does exist, is known as the
minimal realization problem.
Definition 4.9
Given a sequence of real numbers {hi | i = 1, 2, 3, · · · }, the Hankel matrix
H i,j is defined as
⎡ ⎤
h1 h2 ··· hj
⎢h hj+1 ⎥
⎢ 2 h3 ··· ⎥
H i,j = ⎢
⎢ .. .. .. .. ⎥
⎥ (4.68)
⎣. . . . ⎦
hi hi+1 · · · hi+j−1
Definition 4.10
Let σ K be a shift operator satisfying
⎡ ⎤
h1+K h2+K ··· hj+K
⎢h hj+1+K ⎥
⎢ 2+K h3+K ··· ⎥
σ K H i,j = ⎢
⎢ .. .. .. ..
⎥
⎥ (4.69)
⎣ . . . . ⎦
hi+K hi+1+K · · · hi+j−1+K
where matrix H i,j is defined by (4.68).

Theorem 4.4
A sequence of real numbers {hi | i = 1, 2, 3, · · · } has finite-dimensional
realization if and only if there exist integers M and N such that
rank H M,N = rank H M +i,N +j (4.70)
for all i, j = 0, 1, 2, · · · .
Theorem 4.5
If a sequence of real numbers {hi | i = 1, 2, 3, · · · } has finite-dimensional
realization, and if integers M and N satisfy the condition in (4.70), then the
minimal dimension no of realizing the sequence can be determined as
no = rank H M,N (4.71)
Given a sequence {hi | i = 1, 2, 3, · · · } satisfying the condition in (4.70), the

two algorithms described below compute a minimal realization (A, b, c)no
such that hi = cAi−1 b for i = 1, 2, 3, · · · .
1. Direct Realization Algorithm

By applying maximum rank decomposition to the Hankel matrix H M,N , we
obtain
H M,N = U M V N (4.72)
where
rank U M = rank V N = no
and a minimal realization (A, b, c)no can be constructed as
A = (U TM U M )−1 U TM (σH M,N )V TN (V N V TN )−1

b = first column of V N (4.73)
c = first row of U M
2. Canonical Form Realization Algorithm

If the condition in (4.70) is satisfied, we can write
rank H M,no = rank H M,no +1 = no (4.74)
As a result, there exists a unique real vector a = [ano , · · · , a2 , a1 ]T satisfying

H M,no a = η (4.75)
which leads to
a = (H TM,no H M,no )−1 H TM,no η (4.76)
where
η = the last column of H M,no +1
Therefore, based on (4.28), a minimal realization (A, b, c)no can be found as

⎡ ⎤ ⎡ ⎤
0 · · · 0 −ano 1
⎢ . . .. ⎥ ⎢0⎥
⎢ 1 . . .. . ⎥ ⎢ ⎥
A=⎢ . . ⎥, b = ⎢ .. ⎥ , c = h1 h2 · · · hno
⎣ .. . . 0 −a ⎦ ⎣.⎦
2
0 · · · 1 −a1 0
(4.77)
4.6.2 Minimal Partial Realization

Definition 4.11
A triple {A, b, c} of constant matrices is said to be the partial realization of
an input-output map if hi = cAi−1 b holds for i = 1, 2, · · · , N .
Definition 4.12
A triple {A, b, c} of constant matrices is said to be the minimal partial
realization of an input-output map if the size of A is minimal among all
the partial realizations satisfying Definition 4.11.
The problem of finding a realization that satisfies Definition 4.12 will be
called the minimal partial realization problem.
Theorem 4.5
If
rank H λ, μ = rank H λ, μ+1 = rank H λ+1, μ (4.78)
holds for some positive integers λ and μ with λ + μ = N , then finite sequence
{hi | i = 1, 2, · · · , N } admits the partial realization.
We now consider the minimal partial realization problem from a given
finite sequence such that the rank condition in (4.78) is not satisfied.
Definition 4.13
Given a finite sequence of real numbers {hi | i = 1, 2, · · · , N }, the incomplete
Hankel matrix H N N is defined as
⎡ ⎤
h1 h2 · · · hN
⎢ .. ⎥
⎢h ..
.
∗ ⎥
⎢ 2 . ⎥
H N,N = ⎢ ⎥ (4.79)
⎢ .. . .. ⎥
.
⎣ . hN . . ⎦
hN ∗ · · · ∗
where the asterisks denote scalars which extend the given sequence without
affecting the rank of the Hankel matrix.
It is well known [3] that a lower bound for the dimension of a state-
space model realizing a given finite sequence is provided by the rank of the
incomplete Hankel matrix. Since the matrix elements denoted by asterisks
do not change the rank of the Hankel matrix, the rank of the incomplete
Hankel matrix can be readily determined by testing its columns for dependency
starting from the left and ignoring all scalars denoted by asterisks. Based on
the observations made above, we have
Theorem 4.6
The rank of the incomplete Hankel matrix in (4.79), denoted by n(N ), can be
computed as
n(N ) = rank H N,1 + (rank H N −1,2 − rank H N −1,1 )
+ · · · + (rank H 1,N − rank H 1,N −1 )

(4.80)
N N −1
= rank H N −i+1,i − rank H N −i,i
i=1 i=1
Explicitly, the dimension n(N ) is the minimal dimension for the realizations
of a given sequence {hi | i = 1, 2, · · · , N }. This dimension is non-decreasing
regardless of the choice of those elements denoted by asterisks in the
incomplete Hankel matrix in (4.79).
Described below is a concrete procedure for finding the elements denoted
by asterisks such that the rank of the Hankel matrix so constructed remains to
be equal to that of the incomplete Hankel matrix.
Step 1: Obtain an n(N ) × 1 real vector α = [αn(N ) , · · · , α2 , α1 ]T satisfying

⎡ ⎤ ⎡ ⎤
hn(N )+1 h1 h2 · · · hn(N )
⎢h ⎥ ⎢ · · · hn(N )+1 ⎥
⎢ n(N )+1 ⎥ ⎢ h2 h3 ⎥
⎢ ⎥=⎢ ⎥α (4.81)
⎢ .. ⎥ ⎢ .. .. .. .. ⎥
⎣ . ⎦ ⎣ . . . . ⎦
hN hN −n(N ) hN −n(N )+1 · · · hN −1
Step 2: Generate the elements of the extended sequence in order using

hN +i = hN −n(N )+i hN −n(N )+1+i · · · hN −1+i α (4.82)
for i = 1, 2, · · · , n(N ).
Note that although α is not necessarily determined uniquely, its existence
is ensured from (4.79) and (4.80). Hence
rank H N,n(N ) = rank H N +1,n(N ) = rank H N,n(N )+1 = n(N ) (4.83)
which coincides with the condition in (4.78). Therefore, the minimal partial
realization problem can be readily solved by applying a method similar to those
studied in Section 4.6.1. For example, a simple minimal partial realization
(A, b, c)n(N ) can be found by the controllable canonical form
⎡ ⎤ ⎡ ⎤
0 · · · 0 −αn(N ) 1
⎢ ⎥
⎢1 . . .
. .
. ⎥
. ⎢ 0⎥ ⎥
A=⎢
. . ⎥, b = ⎢ ⎢ .. ⎥ , c = h1 h2 · · · hn(N )
⎢ .. . . ⎥ ⎣.⎦
⎣. . 0 −α2 ⎦
0 ··· 1 −α1 0
(4.84)
4.6.3 Balanced Realization

Consider a sequence {hi | i = 1, 2, 3, · · · } that satisfies
∞
|hi | < ∞ (4.85)
i=1
Let H be the Hankel matrix H M,N with sufficiently large integers M and N ,
and H = U V be a maximum rank decomposition of H.
Lemma 4.3
(1) If HH T x = λx with λ = 0 and x = 0, then (U T x)T K c W o =
λ(U T x)T .
(2) If K c W o y = λy with y = 0, then HH T (U y) = λ(U y).
Proof
If HH T x = λx with λ = 0 and x = 0, then xT HH T = λxT which
leads to xT U V V T U T U = λxT U . Hence (U T x)T K c W o = λ(U T x)T
because K c = V V T and W o = U T U . This means that (λ, U T x) is an
eigenvalue left-eigenvector pair for K c W o because U T x = 0. Conversely,
if K c W o y = λy with y = 0, then U V V T U T U y = λU y, namely,
HH T (U y) = λ(U y). Hence, x = U y = 0 is an eigenvector for HH T .
This completes the proof of the lemma.
By Lemma 4.3, nonzero eigenvalues of HH T coincide with those of
K c W o . Hence H can be factorized using singular value decomposition
(SVD) as
H = U o ΣV To (4.86)
where U To U o = V To V o = I n and Σ is defined by (4.44) (i.e., Σ =
diag{σ1 , σ2 , · · · , σn } and σ1 ≥ σ2 ≥ · · · ≥ σn > 0). In this case,
H T H = V o Σ2 V To and HH T = U o Σ2 U To hold.
As demonstrated below, several useful realizations can be deduced based
on above analysis.
1. Balanced Realization
1 1
By letting U = U o Σ 2 and V = Σ 2 V To in H = U V , we obtain K c =
W o = Σ which gives a balanced realization (A, b, c)n as follows.
1 1
A = Σ− 2 U To σHV o Σ− 2
1 1
(4.87)
T
b = first column of Σ V2
o, c = first row of U o Σ 2
2. Input-Normal Realization
By letting U = U o Σ and V = V To in H = U V , we obtain K c = I n and
W o = Σ2 which gives an input-normal realization (A, b, c)n as follows.
A = Σ−1 U To σHV o
(4.88)
b = first column of V To , c = first row of U o Σ
4.7 Discrete-Time Lossless Bounded-Real Lemma 91
3. Output-Normal Realization
By letting U = U o and V = ΣV To in H = U V , we obtain K c = Σ2 and
W o = I n which gives an output-normal realization (A, b, c)n as follows.
A = U To σHV o Σ−1
(4.89)
b = first column of ΣV To , c = first row of U o
Obviously, these realizations correspond to the three normalized state-space

models of Section 4.4.3.
4. Reduced-Order Approximation
Using SVD of the Hankel matrix and the balanced realization induced from
it, an algorithm for reduced-order approximation can be deduced. Let SVD of
H in (4.86) be partitioned as
T
Σ1 0 V o1
H = U o1 U o2 (4.90)
0 Σ2 V o2
where
Σ1 = {σ1 , σ2 , · · · , σr }, Σ2 = {σr+1 , · · · , σn }, σr >> σr+1
A good rth-order approximation (A11 , b1 , c1 )r of (4.87) can be obtained as

−1 − 12
A11 = Σ1 2 U To1 σHV o1 Σ1
1 1
(4.91)
T
b1 = first column of Σ1 V 2
o1 , c1 = first row of U o1 Σ1 2
It is known that Δ1 = U o1 Σ1 V To1 minimizes ||H − Δ1 ||s over all matrices

Δ1 of rank r and ||H − Δ1 ||s = σr+1 where ||A||s max eigenvalue of AT A.
Unfortunately, Δ1 is not a Hankel matrix in general and hence, Δ1 does not
admit an exact realization. However, by employing an algorithm similar to
(4.87), often times a good reduced-order approximation can be attained [7].
4.7 Discrete-Time Lossless Bounded-Real Lemma

The discrete-time lossless bounded-real lemma is a passivity property of
discrete-time systems, which finds applications in network synthesis and
stability analysis. To introduce this important property, consider a state-space
model (A, b, c, d)n described by (4.1), whose transfer function is given by

(4.10). It is assumed that the state-space model (A, b, c, d)n is asymptotically
stable, controllable and observable, i.e., (A, b, c, d)n is a minimal realization.
Definition 4.14
A transfer function H(z) is said to be lossless bounded-real (LBR) if it is
asymptotically stable and |H(ejω )|2 = 1 holds for all ω.
An equivalent characterization of |H(ejω )|2 = 1 for all ω is that for every
finite-energy input sequence u(k), the output sequence y(k) of the system
satisfies
∞ ∞
|y(k)|2 = |u(k)|2 (4.92)
k=0 k=0
where the initial state vector x(0) is assumed to be null. Note that the
losslessness property is satisfied by all-pass filters.
Theorem 4.7: Discrete-Time LBR Lemma

H(z) is lossless bounded-real (LBR) if and only if there exists a real positive-
definite symmetric matrix P such that [12]
AT P A + c T c = P (4.93a)
bT P b + dT d = 1 (4.93b)
A T P b + cT d = 0 (4.93c)
Proof: (Sufficiency) Assuming that the equations in (4.93) holds true,
Equation (4.93a) can be written as
P = AT [AT P A + cT c]A + cT c
= (AT )2 P A2 + (cA)T cA + cT c
(4.94)
= ···
= (AT )n P An + U Tn U n
where U n = [cT (cA)T · · · (cAn−1 )T ]T is the observability matrix of the

system in (4.1). Recall the standard Lyapunov stability theorem in [13], which
states that matrix B has all eigenvalues in the open unit disk if and only if
there exist two positive-definite symmetric matrices V and W for which
V = B T V B + W . By taking V = P and W = U Tn U n , and noticing
4.7 Discrete-Time Lossless Bounded-Real Lemma 93
the observability of the system in (4.1) hence the nonsingularity of U n , we

conclude that both V and W are positive-definite, hence (4.94) satisfies the
condition in the standard Lyapunov stability theorem, and all eigenvalues of
An must lie inside the open unit disk. Evidently, these eigenvalues are simply
the nth powers of the eigenvalues of A, and thus all eigenvalues of A must
lie inside the open unit disk. Thus H(z) is asymptotically stable.
Next, since P = P T > 0, matrix P can be decomposed as P =
T −T T −1 . Hence, (4.93) can be written as
T
A A + cT c = I n (4.95a)
T
b b + dT d = 1 (4.95b)
T
A b + cT d = 0 (4.95c)
where
A = T −1 AT , b = T −1 b, c = cT
We can now consider an equivalent state-space model of the system H(z)
given by

x(k + 1) A b x(k)
= (4.96)
y(k) c d u(k)
where x(k) = T −1 x(k). Alternatively, by defining

A b
R= (4.97)
c d
Equation (4.95) can be expressed as
RT R = I n+1 (4.98)
which means that R is an orthogonal matrix. As a result,
||x(k + 1)||2 + |y(k)|2 = ||x(k)||2 + |u(k)|2 (4.99)
holds for any nonnegative integer k, hence

N N
|y(k)|2 = |u(k)|2 + ||x(0)||2 − ||x(N + 1)||2 (4.100)
k=0 k=0
is satisfied for every positive integer N . If we assume that u(k) = 0 for

k > N , then
y(k) = c x(k) for k > N (4.101)
which, by virtue of (4.95), leads to
|y(k)|2 = x(k)T cT c x(k)
(4.102)
T
= x(k)T I n − A A x(k)
Hence,
∞ ∞
2

|y(k)| = ||x(k)||2 − ||x(k + 1)||2
k=N +1 k=N +1 (4.103)
= ||x(N + 1)||2
Equations (4.100) and (4.103) result in
∞ ∞
|y(k)|2 = |u(k)|2 + ||x(0)||2 (4.104)
k=0 k=0
for every finite-energy input that is identically zero for k > N , where N is an
arbitrary finite positive integer. This reveals that H(z) is LBR.
(Necessity) Assume that H(z) is LBR. Let a minimal realization of H(z)
be given by (A, b, c, d)n . Since H(z) is asymptotically stable, the matrix
defined by
∞
P = (AT )i cT cAi (4.105)
i=0
is a symmetric positive-definite matrix and satisfies
P = AT P A + cT c (4.106)
By decomposing matrix P in (4.105) as P = T −T T −1 , and defining a state-
space model (A, b, c, d)n as in (4.96), we can derive
T
I n = A A + cT c (4.107)
from (4.106). Next, by LBR property of H(z),
∞ ∞
|y(k)|2 = |u(k)|2 (4.108)
k=0 k=0
4.8 Summary 95
holds for any finite-energy input, under the assumption that the initial states
are zero. In particular, suppose u(k) = 0 for k > N where N is an arbitrary
finite positive integer, then y(k) = c x(k) holds for k > N , hence
T
|y(k)|2 = x(k)T I n − A A x(k)
(4.109)
= ||x(k)||2 − ||x(k + 1)||2 for k > N
is obtained by (4.107). Thus, (4.108) can be written as
N N
|y(k)|2 + ||x(N + 1)||2 = |u(k)|2 (4.110)
k=0 k=0
because
∞
|y(k)|2 = ||x(N + 1)||2 (4.111)
k=N +1
By replacing N by N + 1 and then subtracting, we obtain

T
x(N + 1) T
x(N )
x(N + 1) y(N ) = x(N ) u(N ) (4.112)
y(N ) u(N )
for any finite positive integer N . This means that R defined by (4.97) is
orthogonal, and we arrive at (4.93). This completes the proof of Theorem 4.7.
4.8 Summary
In this chapter, we have presented fundamental properties of the state-space
description of linear causal dynamical systems, a method for deriving its
transfer function from a given state-space model, the concepts of the equivalent
transformation, the canonical structure decomposition, and methods for state-
space realization using the Hankel matrix. Both the state-space description
and the external (input-output) description of linear causal dynamical systems
are of significance and useful in practice. However, when it comes to choosing
the most appropriate description, the matter is often problem dependent. In
addition, we have studied a passivity property of discrete-time systems, known
as lossless bounded-real lemma, which finds applications in network synthesis
and stability analysis.
References
[1] R. E. Kalman, “Mathematical description of linear systems,” J. SIAM
Contr., Ser. A, vol. 1, no. 2, pp. 152–192, 1963.
[2] R. E. Kalman, P. L. Falb and M. A. Arbib, Topics in Mathematical System
Theory, New York, McGraw-Hill, 1969.
[3] A. J. Tether, “Construction of minimal linear state-variable models from
finite input-output data,” IEEEE Trans. Automat. Contr., vol. AC-15,
no. 4, pp. 427–436, Aug. 1970.
[4] H. Kogo and T. Mita, Introduction to System Control Theory, Tokyo,
Japan, Jikkyo Shuppan, 1979.
[5] T. Hinamoto, “Realizations of a state-space model from two-dimensional
input-output map,” IEEE Trans. Circuits Syst., vol. CAS-27, no. 1,
pp. 36–44, Jan. 1980.
[6] V. C. Klema and A. J. Laub, “The singular value decomposition: Its
computation and some applications,” IEEEE Trans. Automat. Contr.,
vol. AC-25, no. 2, pp. 164–176, Apr. 1980.
[7] L. M. Silverman, “Optimal approximation of linear systems,” in Proc,
Joint Automat. Contr. Conf., S. F., 1980, FA8-A.
[8] T. Hinamoto and F. W. Fairman, “Separable-denominator state-space
realization of two-dimensional filters using a canonic form,” IEEE Trans.
Acoust. Speech, Signal Process., vol. ASSP-29, no. 4, pp. 846–853,
Aug, 1981.
[9] J. R. Sveinsson and F. W. Fairman, “Minimal balanced realization of
transfer function matrices using Markov parameters,” IEEEE Trans.
Automat. Contr., vol. AC-30, no. 10, pp. 1014–1016, Oct. 1985.
[10] T. Hinamoto, S. Maekawa, J. Shimonishi and A. N. Venetsanopou-
los, “Balanced realization and model reduction of 3-D separable-
denominator transfer functions,” Franklin Institute, vol. 325, no. 2,
pp. 207–219, 1988.
[11] M. S. Santina, A. R. Stubberud and G. H. Hostetter, Digital Control
System Design, 2nd ed. Orlando, FL, Saunders College Publishing,
Harcourt Brace College Publishers, 1994.
[12] P. P. Vaidyanathan, “The discrete-time bounded-real lemma in digital
filtering,” IEEE Trans. Circuits Syst., vol. CAS-32, no. 9, pp. 918–924,
Sep. 1985.
[13] R. E. Kalman and J. Bertram, “Control system design via the second
method of Liapunov, part II, discrete time systems,” ASME J. Basic
Engineering, vol. 82, pp. 394–400, 1960.
5
FIR Digital Filter Design
5.1 Preview
Digital filters with finite sequence of the impulse response are called FIR
digital filters, nonrecursive digital filters or digital transversal filters, where
“FIR” is the acronym of terms, “Finite Impulse Response”. A general FIR
digital filter of order N − 1 is described by
N
−1
y(k) = hi u(k − i) (5.1)
i=0
where u(k) and y(k) are scalar input and output, respectively, and hi for
i = 0, 1, · · · , N − 1 denote the impulse response. The transfer function of the
FIR digital filter in (5.1) can be expressed as
H(z) = h0 + h1 z −1 + · · · + hN −1 z −(N −1) (5.2)
A block diagram of an FIR digital filter in (5.1) is depicted in Figure 5.1.

FIR digital filters are the preferred filtering scheme in many DSP applica-
tions, mainly due to the advantages of the FIR digital filters as compared to
their IIR counterparts, i.e.,
1. FIR digital filters are always stable.
2. Exact linear-phase response can easily be achieved by imposing either
symmetric or antisymmetric condition on the FIR filter’s coefficients.
3. FIR digital filters possess low output noise due to coefficient quantization
and multiplication roundoff errors.
4. Effective methods for the design of a variety of FIR digital filters are
available.
On the other hand, in the case of designing IIR digital filters, stability is
always a concern. For IIR digital filters, exact linear-phase responses cannot
97
98 FIR Digital Filter Design
u(k) z -1 z -1 ... z -1
h0 h1 h2 hN-2 hN-1
... y(k)
Figure 5.1 A block diagram of an FIR digital filter.
be realized in general, even in the passband. Coefficient sensitivity as well as

output roundoff noise due to multiplications often become severe and therefore
particular cares might be taken to deal with these problems.
The main disadvantage of FIR digital filters is that the order of an FIR
digital filter is usually considerably higher than its IIR counterpart to meet the
same design specification, especially when the transition bands are narrow.
As a result, the implementation of an FIR digital filter with narrow transition
bands is often costly.
5.2 Filter Classification

The frequency response of a digital filter can be described by
H(ejω ) = M (ω)ejθ(ω) (5.3)
where M (ω) is the magnitude response of the filter, and θ(ω) is the phase
characteristic of the filter. The ideal magnitude responses can commonly be
used to classify digital filters. Even if such digital filters are not realizable,
they can be approximated in practice with some acceptable tolerance.
The magnitude responses of the four typical types of ideal digital filters
are illustrated in Figure 5.2. For lowpass filter of Figure 5.2(a), the passband
and the stopband are given by 0 ≤ ω ≤ ωp and ωp < ω ≤ π, respectively.
For highpass filter of Figure 5.2(b), the stopband is given by 0 ≤ ω < ωp ,
while the passband is given by and ωp ≤ ω ≤ π, respectively. For bandpass
filter of Figure 5.2(c), the passband region is given by ωp1 ≤ ω ≤ ωp2 , and
the stopband regions are specified by 0 ≤ ω < ωp1 and ωp2 < ω ≤ π.
Finally, for bandstop filter of Figure 5.2(d), the passband regions are given
by 0 ≤ ω ≤ ωp1 and ωp2 ≤ ω ≤ π, while the stopband region is ωp1 <
ω < ωp2 . The frequencies ωp , ωp1 , and ωp2 are called the passband edges
of their respective filters. It is observed from the figure that an ideal filter
5.2 Filter Classification 99
M(ω) M(ω)
passband passband
1 1
stopband stopband
0 ωp π ω 0 ωp π ω
(a) (b)
M(ω) M(ω)
passband passband passband
1 1
stopband stopband stopband

0 ωp1 ωp2 π ω 0 ωp1 ωp2 π ω
(c) (d)
Figure 5.2 Four types of ideal filters. (a) Ideal lowpass filter. (b) Ideal highpass filter.
(c) Ideal bandpass filter. (d) Ideal bandstop filter.
has the magnitude response equal to unity in the passband and zero in the
stopband.
The specifications of magnitude responses with some acceptable tolerance
for the four typical types of digital filters are shown in Figure 5.3. For the
digital filters of Figure 5.3, the passband and the stopband allow to have some
acceptable tolerance, respectively, while the transition band regions are free
from any specifications.
For the design of a lowpass FIR digital filter, a few formulas exist for
estimating the minimum value of filter length N directly from the digital
filter specifications. Let ωp and ωs denote the normalized passband edge
frequency and the normalized stopband edge frequency, respectively, and let
δp and δs indicate the peak passband ripple and the peak stopband ripple,
respectively.
Kaiser’s Formula
A simple formula for estimating filter length N which meets the desired
specifications is given by [3]

−20 log10 ( δp δs ) − 13
N (5.4)
14.6Δ
M(ω) M(ω)
1 1
δp δp
δs δs
0 ωp ωs π ω 0 ωs ωp π ω
(a) (b)
M(ω) M(ω)
1 1
δp δp1 δp2
δs1 δs2 δs
0 ωs1 ωp1 ωp2 ωs2 π ω 0 ωp1 ωs1 ωs2 ωp2 π ω

(c) (c)
Figure 5.3 Typical magnitude response specifications. (a) Lowpass filter. (b) Highpass filter.
(c) Bandpass filter. (d) Bandstop filter.
where Δ is the transition band width normalized by sampling frequency,

namely,
ωs − ωp
Δ=
2π
Bellanger’s Formula
Another simple formula for estimating filter length N to meet the desired
specifications is given by [3]
2 log10 (10δp δs )
N − −1 (5.5)
3Δ
where Δ is defined in (5.4).
5.3 Linear-phase Filters

5.3.1 Frequency Transfer Function
The frequency transfer function of the filter in (5.2) can be obtained by setting
z = ejω as
N−1
H(ejω ) = hi e−jωi = |H(ejω )|ejθ(ω) (5.6)
i=0
5.3 Linear-phase Filters 101
where

N −1 2 N
−1 2

jω
|H(e )| = hi cos iω + hi sin iω
i=0 i=0
⎛ ⎞
N
−1
⎜ hi sin iω ⎟
⎜ ⎟
⎜ i=0 ⎟
θ(ω) = − tan−1 ⎜ N −1 ⎟
⎜ ⎟
⎝ hi cos iω ⎠
i=0
|H(ejω )| and θ(ω) in (5.6) are called the amplitude response and the phase
characteristic, respectively. It is obvious that |H(ejω )| is an even function,
and θ(ω) is an odd function. The phase delay and the group delay for the filter
in (5.6) are defined as
θ(ω) dθ(ω)
τp (ω) = − , τg (ω) = − (5.7)
ω dω
respectively.
5.3.2 Symmetric Impulse Responses

For constant phase delay, θ(ω) must be linear with respect to ω, that is,
θ(ω) = −τ ω (5.8)

N
−1
hi sin iω
i=0 sin τ ω
= (5.9)
N
−1 cos τ ω
hi cos iω
i=0
which leads to
N −1
N
−1
hi cos iω sin τ ω − sin iω cos τ ω = hi sin(τ ω − iω) = 0 (5.10)
i=0 i=0
The values of τ and hi for i = 0, 1, · · · , N − 1 satisfying (5.10) are given by

N −1
τ= , hi = hN −1−i for i = 0, 1, · · · , N − 1 (5.11)
2
Hence, it is only necessary for the impulse response to be symmetric about
the shifted origin (N − 1)/2. Unlike IIR digital filters, FIR digital filters can
have a linear phase over the entire baseband. The impulse responses which
are symmetric about the shifted origin (N − 1)/2 for odd N as well as even
N are illustrated in Figure 5.4.
1. Symmetric Impulse Response of Even Order N − 1 (Type 1)

Since hi = hN −1−i for i = 0, 1, · · · , N − 1, we write (5.2) as
N −1
−1 N −1

2
N −1
H(z) = hi z −i + h N −1 z − 2 + hN −1−i z −i
2
i=0 i= N 2−1 +1
N −1 N −1
−1 −1

2
2
−i − N 2−1
= hi z + h N −1 z + hl z −(N −1−l) (5.12)
2
i=0 l=0
⎡ N −1 ⎤
−1
N −1
2 N −1 N −1
= z− 2 ⎣ h N −1 + hi z 2
−i
+ z −( 2
−i) ⎦
2
i=0
The frequency transfer function of the filter in (5.12) becomes

⎡ ⎤
N −1

2
−1
N −1 N −1
H(ejω ) = ⎣ h N −1 + 2hi cos − i ω ⎦ e−j 2 ω
2 2
i=0
⎡ N −1 ⎤ (5.13)
2
N −1
=⎣ ck cos(kω) ⎦ e−j 2 ω
k=0
Center of symmetry Center of symmetry

N = 11 N = 10
hi hi i
i
i = 10 i = 10
(a) (b)
Figure 5.4 Symmetric impulse responses. (a) N is odd. (b) N is even.
where
c0 = h N −1 , ck = 2h N −1 −k for 1 ≤ k ≤ (N − 1)/2
2 2
2. Symmetric Impulse Response of Odd Order N − 1 (Type 2)

In this case, (5.2) is changed to
N
−1 N −1

2
H(z) = hi z −i + hN −1−i z −i
i=0 i= N
2
N N
−1 −1

2
2
−i
= hi z + hl z −(N −1−l) (5.14)
i=0 l=0
⎡N ⎤
−1 N −1
N −1
2
N −1
= z− 2 ⎣ hi z 2 −i + z −( 2 −i) ⎦
i=0
The frequency transfer function of the filter in (5.14) can be expressed as

⎡N ⎤

2
−1
N −1 N −1
H(ejω ) = ⎣ 2hi cos − i ω ⎦ e−j 2 ω
2
i=0
⎡ N ⎤ (5.15)
2
1 N −1
=⎣ ck cos k − ω ⎦ e−j 2 ω
2
k=1
where
N
ck = 2h N −k for k = 1, 2, · · · ,
2 2
Since cos(k − 1/2)π = 0 in (5.15), it follows that H(ejπ ) = 0. This reveals
that odd-order FIR filters with symmetric impulse response are not suitable
for the design of highpass filters.
By defining
1 1
c1 = c̃1 + c̃0 , c N = c̃ N −1
2 2 2 2
(5.16)
1
ck = (c̃k + c̃k−1 ) for 2 ≤ k ≤ N/2 − 1
2
we obtain
N

2
1 1 1 1 1
ck cos k − ω = c̃0 cos ω + c̃1 cos ω + cos 1 + ω
2 2 2 2 2
k=1

1 1
+ c̃2 cos 1 + ω
2 2

1
+ cos 2 + ω + ···
2
(5.17)
1 N 1
+ c̃ N −1 cos −2+ ω
2 2 2 2

N 1
+ cos −1+ ω
2 2
N
−1
1 2
= cos ω c̃k cos(kω)

2
k=0
Substituting (5.17) into (5.15) yields

⎡N ⎤
ω 2
−1
N −1
H(ejω ) = cos ⎣ c̃k cos(kω) ⎦ e−j 2 ω (5.18)
2
k=0
5.3.3 Antisymmetric Impulse Responses

In many applications, only the group delay needs to be constant. A phase
response with constant group delay assumes the form
θ(ω) = −τ ω + θo , 0≤ω≤π (5.19)
Applying the arguments similar to those in (5.9) and (5.10), we arrive at

N
−1
hi sin(τ ω − iω − θo ) = 0 (5.20)
i=0
In order to find a solution of (5.20), we set θo = ±π/2. Then, (5.20) is

changed to
N−1
hi cos(τ ω − iω) = 0 (5.21)
i=0
The values of τ and hi for i = 0, 1, · · · , N − 1 satisfying (5.21) are given by
N −1
τ= , hi = −hN −1−i for i = 0, 1, · · · , N − 1 (5.22)
2
Here, the impulse response is required to be antisymmetric about the shifted
origin (N − 1)/2 in which an FIR digital filter has a linear phase over the
entire baseband. The impulse responses which are antisymmetric about the
shifted origin (N − 1)/2 for odd N as well as even N are illustrated in
Figure 5.5.
3. Antisymmetric Impulse Response of Even Order N − 1

(Type 3)
Since hi = −hN −1−i for i = 0, 1, · · · , N − 1 and h N −1 = 0, we can write
2
(5.2) as
N −1
−1 N −1

2
H(z) = hi z −i − hN −1−i z −i
i=0 i= N 2−1 +1
N −1 N −1
−1 −1

2 2
= hi z −i − hl z −(N −1−l) (5.23)
i=0 l=0
⎡ N −1 ⎤
−1 N −1
N −1
2
N −1
= z− 2 ⎣ hi z 2 −i − z −( 2 −i) ⎦
i=0
Center of antisymmetry Center of antisymmetry

N = 11 N = 10
hi hi x
i
i
i = 10 i = 10
(a) (b)
Figure 5.5 Antisymmetric impulse responses. (a) N is odd. (b) N is even.
The frequency transfer function of the filter in (5.23) becomes

⎡ N −1 ⎤

2
−1
N −1 N −1
H(ejω ) = ⎣ 2hi sin − i ω ⎦ je−j 2 ω
2
i=0
⎡ N −1 ⎤ (5.24)
2
=⎣ ck sin(kω) ⎦ e−jθ(ω)
k=1
where
ck = 2h N −1 −k for 1 ≤ k ≤ (N − 1)/2,
⎧ 2
⎪
⎪ π N −1
⎨ 2 − 2 ω
⎪ for ω > 0
θ(ω) =
⎪
⎪ π N −1
⎪
⎩ − − ω for ω < 0
2 2
Since sin(k0) = sin(kπ) = 0 in (5.24), it follows that H(ej0 ) = H(ejπ ) = 0.
Therefore, even-order FIR filters with antisymmetric impulse response are
inadequate for the design of lowpass and highpass filters.
By defining
1 1
c1 = c̃0 − c̃2 , c N −1 = c̃ N −3
2 2 2 2
(5.25)
1
ck = (c̃k−1 − c̃k+1 ) for 2 ≤ k ≤ (N − 5)/2
2
we have
N −1
2
1 1
ck sin kω = c̃0 sin ω + c̃1 sin 2ω + c̃2 sin 3ω − sin ω
2 2
k=1
1 1
+ c̃3 sin 4ω − sin 2ω + c̃4 sin 5ω − sin 3ω
2 2

N −1 N −5
+ · · · + c̃ N −3 sin ω − sin ω
2 2 2
N −3
2
= sin ω c̃k cos(kω)

k=0
(5.26)
Substituting (5.26) into (5.24) gives

⎡ N −3 ⎤
2
H(ejω ) = sin ω ⎣ c̃k cos(kω) ⎦ e−jθ(ω) (5.27)

k=0
4. Antisymmetric Impulse Response of Odd Order N − 1 (Type 4)

In this case, (5.2) is changed to
N
−1 N −1

2
−i
H(z) = hi z − hN −1−i z −i
i=0 i= N
2
N N
−1 −1

2
2
−i
= hi z − hl z −(N −1−l) (5.28)
i=0 l=0
⎡N ⎤
−1 N −1
N −1
2
N −1
= z− 2 ⎣ hi z 2 −i − z −( 2 −i) ⎦
i=0
The frequency transfer function of the filter in (5.28) can be expressed as

⎡N ⎤

2
−1
N −1 N −1
H(ejω ) = ⎣ 2hi sin − i ω ⎦ je−j 2 ω
2
i=0
⎡ N ⎤ (5.29)
2
1
=⎣ ck sin k − ω ⎦ ejθ(ω)
2
k=1
where
⎧
⎪
⎪ π N −1
⎨ 2 − 2 ω
⎪ for ω > 0
ck = 2h N −k for 1 ≤ k ≤ N/2, θ(ω) =
2 ⎪
⎪ π N −1
⎪
⎩ − − ω for ω < 0
2 2
Since sin(k − 1/2)0 = 0 in (5.29), it follows that H(ej0 ) = 0. Hence odd-
order FIR filters with antisymmetric impulse response are inadequate for the
design of lowpass filters.
By defining
1 1
c1 = c̃0 − c̃1 , cN = c̃ N
2 2 2 2 −1
(5.30)
1
ck = (c̃k−1 − c̃k ) for 2 ≤ k ≤ N/2 − 1
2
we can write
N

2
1 1 1 1 1
ck sin k − ω = c̃0 sin ω + c̃1 sin 1 + ω − sin ω
2 2 2 2 2
k=1

1 1 1
+ c̃2 sin 2 + ω − sin 1 + ω + ···
2 2 2

1 N 1
+ c̃ N −1 sin −1+ ω
2 2 2 2

N 1
− sin −2+ ω
2 2
N
−1
1 2
= sin ω c̃k cos(kω)

2
k=0
(5.31)
⎡N ⎤
ω 2
−1
H(ejω ) = sin ⎣ c̃k cos(kω) ⎦ e−jθ(ω) (5.32)
2
k=0
5.4 Design Using Window Function

5.4.1 Fourier Series Expansion
Suppose that Hd (ejω ) is the desired frequency response. Since Hd (ejω ) a
periodic function of ω with a period 2π, it can be represented by its Fourier
series as
∞
jω
Hd (e ) = hi e−jωi (5.33)
i=−∞
5.4 Design Using Window Function 109
where the Fourier coefficients given by

π
1
hi = Hd (ejω )ejωi dω, −∞ < i < ∞ (5.34)
2π −π
correspond precisely to the impulse response samples. Substituting ejω = z

into (5.33) yields
∞
Hd (z) = hi z −i (5.35)
i=−∞
Therefore, the transfer function Hd (z) in (5.35) can be determined by comput-

ing its hi ’s using (5.34). We remark, however, that the corresponding impulse
response is of infinite length and noncausal.
In order to find a finite-duration impulse response sequence {hi } of length
N , we have to truncate the impulse response sequence with finite terms, that is,
N −1
hi = 0 for | i | > (5.36)
2
where N − 1 is assumed to be even. Then, we obtain
N −1
2
H(z) = h0 + (h−i z i + hi z −i ) (5.37)

i=1
and a causal FIR digital filter can then be derived from (5.37) by setting
N −1
Hc (z) = z − 2 H(z).
As an example, consider the design of an ideal lowpass filter with linear
phase whose magnitude response and phase characteristic are shown in
Figure 5.6. The frequency transfer function is given by
1 · e−jωτ for |ω| ≤ ωc

Hd (ejω ) = M (ω)ejθ(ω) = (5.38)
0 for |ω| > ωc
Using (5.34) yields
ωc
1
hi = 1 · e−jωτ ejωi dω
2π −ωc
! "ωc (5.39)
1 ejω(i−τ ) sin ωc (i − τ )
= =
2π j(i − τ ) π(i − τ )
−ωc
M( ) θ (ω)
0 ω
−ωτ
0 c
(a) (b)
Figure 5.6 Ideal lowpass filter characteristics. (a) Magnitude response. (b) Phase
characteristic.
By applying the truncation in (5.36) to (5.39) and setting

N −1
2
N −1
Hc (z) = hi z −(i+ 2
)
(5.40)
i=− N 2−1
we obtain a causal and feasible FIR digital filter of order N − 1.
5.4.2 Window Functions

There exist many tapered windows in the literature, however an introduction
of all these windows is beyond the scope of this text. Our discussion will be
restricted to several well-known tapered windows of length N .
Let an infinite impulse response sequence {hi | − ∞ < i < ∞} be
converted into a finite impulse response sequence {wi hi | | i | ≤ (N − 1)/2}
where {wi | | i | ≤ (N − 1)/2} is said to be the window function. By applying
the z-transform to the finite impulse response sequence {wi hi }, the frequency
transfer function of the resulting causal FIR digital filter can be expressed as
N −1
2
N −1
Hc (ejω ) = wi hi e−jω(i+ 2
)
(5.41)
i=− N 2−1
1. Rectangular Window
The rectangular window is defined by
N −1
1 for | i | ≤ 2
wi = (5.42)
N −1
0 for | i | > 2
Since the Fourier series are truncated outside the N terms, the undesirable
Gibbs phenomenon, which is known to be inherently associated with the
Fourier series near the function’s discontinuities, will occur. Various windows
have been proposed to deal with this problem.
2. Bartlett Window
The Bartlett window is described by [3]
⎧
⎨ 1 − 2 | i | for | i | ≤ N −1
N −1 2
wi = (5.43)
⎩ 0 for | i | > N −1
2
3. Generalized Hamming Window

The generalized Hamming window is given by [8]
⎧
⎨ α + (1 − α) cos( N2πi
−1 ) for | i | ≤
N −1
2
wi = (5.44)
⎩ 0 for | i | > N −1
2
where 0 ≤ α ≤ 1. The window in (5.44) is called the Hamming window in

case α = 0.54 and the Hanning window in case α = 0.50.
4. Blackman Window
The Blackman window is specified by [3]
⎧
⎨ 0.42 + 0.5 cos( N2πi 4πi
−1 ) + 0.08 cos( N −1 ) for | i | ≤
N −1
2
wi =
⎩ 0 for | i | > N −1
2
(5.45)
Continuous profiles of these windows are depicted in Figure 5.7. For illustra-
tion purposes the amplitude responses of the FIR filter specified by (5.39) and
(5.41) with N = 31 and window {wi | − (N − 1)/2 ≤ i ≤ (N − 1)/2} being
each of the above windows are displayed in Figure 5.8.
5.4.3 Frequency Transformation

Suppose that an FIR lowpass filter with cutoff frequency ωc has been designed
and its transfer function is given by
Rectangular
1
0.8 Bartlett
0.6 Blackman
0.4 Hanning
0.2 Hamming
0
N−1 0 i N−1
− 2 2
Figure 5.7 Plots of the fixed windows shown with solid lines for clearness.
N
−1
HLP (z) = hLP
i z
−i
(5.46)
i=0
where {hLP
i } denotes the impulse response of the filter.
1. FIR Highpass Filter

The impulse response of an FIR highpass filter can be expressed as
hHP
i = (−1)i hLP
i for i = 0, 1, · · · , N − 1 (5.47)
The highpass characteristic of {hHP i } can be verified by evaluating its

frequency response in terms of that of the FIR lowpass filter:
N
−1 N
−1
HHP (ejω ) = (−1)i hLP
i e
−jωi
= (ejπ )i hLP
i e
−jωi
i=0 i=0
(5.48)
N
−1
= hLP
i e
−j(ω−π)i
= HLP (ej(ω−π) )
i=0
where the cutoff frequency is π − ωc .

Rectangular window Bartlett window

0 0
−20 −20
Gain, dB
Gain, dB
−40 −40
−60 −60
−80 −80
−100 −100
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
ω/π ω/π
(a) (b)
Hanning window Hamming window
0 0
−20 −20
Gain, dB
Gain, dB
−40 −40
−60 −60
−80 −80
−100 −100
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
ω/π ω/π
(c) (d)
Blackman window
0
−20
Gain, dB
−40
−60
−80
−100
0 0.2 0.4 0.6 0.8 1
ω/π
(e)
Figure 5.8 Gain responses of the fixed window functions.
2. FIR Bandpass Filter

The impulse response of an FIR bandpass filter can be written as
hBP
i = (2 cos ωo i)hLP
i for i = 0, 1, · · · , N − 1 (5.49)
where ωo stands for the center frequency of the passband. The bandpass
characteristic of {hBP
i } can be demonstrated by evaluating its frequency
response in terms of that of the FIR lowpass filter:
N
−1

HBP (ejω ) = ejωo i + e−jωo i hLP
i e
−jωi
i=0
N
−1 N
−1 (5.50)
= hLP
i e
−j(ω−ωo )i
+ hLP
i e
−j(ω+ωo )i
i=0 i=0

= HLP ej(ω−ωo ) + HLP ej(ω+ωo )
3. FIR Bandstop Filter

The impulse response of an FIR bandstop filter can be specified by
hBS BP
0 = 1 − h0 , hBS
i = −hBP
i for i = 1, 2, · · · , N − 1 (5.51)
The bandstop characteristic of {hBS i } can be verified by evaluating its

frequency response in terms of that of the FIR bandpass filter:
N
−1
HBS (ejω ) = 1 − hBP
0 − hBP
i e
−jωi
i=1
N
−1 (5.52)
=1− hBP
i e
−jωi
i=0
= 1 − HBP (ejω )
The frequency transformation methods stated above are illustrated in
Figure 5.9.
5.5 Least-Squares Design

5.5.1 Quadratic-Measure Minimization
In a typical design of lowpass filters, the desired frequency response of a
lowpass filter is given by
e−jθ(ω) for |ω| ≤ ωp

D(ejω ) = (5.53)
0 for |ω| ≥ ωs
5.5 Least-Squares Design 115
1 1
−2π −π −ωc 0 ωc π 2π ω −2π −π−ωc −π −π +ωc 0 π−ωc π π +ωc 2π ω
(a) (b)
1 1
−2π −π −ωo 0 ω1ωo ω2 π 2π ω −2π −π −ω2 −ω1 0 ω1 ω2 π 2π ω
(c) (d)
Figure 5.9 Magnitude responses of ideal filters. (a) Ideal lowpass filter. (b) Ideal highpass
filter. (c) Ideal bandpass filter. (d) Ideal bandstop filter.
where ωp and ωs denote the passband edge and the stopband edge, respectively,
and the characteristics of transition band ωp < ω < ωs are not specified.
To minimize a quadratic measure of the passband and stopband error in
the frequency domain, the total error function is formulated as
ωp π
E=α |D(ejω ) − H(ejω )|2 dω + β |H(ejω )|2 dω
0 ωs (5.54)
= αEp + βEs
where H(ejω ) is the frequency response of the filter to be designed, Ep

and Es are the passband and stopband errors, respectively, and α and β are
positive weighting parameters which control the relative accuracies of approx-
imation in the passband and stopband, respectively. In general, the larger
the weighting parameter is, the better the performance of its corresponding
band becomes. We remark that weighting parameters can also be functions of
frequency ω.
For simplicity, we assume that H(ejω ) is specified by (5.13), the desired
phase response is chosen as θ(ω) = N 2−1 ω in (5.53), and α and β are chosen
to be constant. Then (5.44) can be expressed as
E = αEp + βEs
(5.55)
= α cT Ac − 2cT p + q + β cT Bc
where
T T
c = c0 c1 · · · c N −1 , φ(ω) = 1 cos ω · · · cos( N 2−1 ω)
2
ωp ωp ωp
A= φ(ω)φ(ω)T dω, p= φ(ω)dω, q= dω
0 0 0
π
B= φ(ω)φ(ω)T dω, c0 = h N −1
2
ωs
ck = 2 h N −1 −k for 1 ≤ k ≤ (N − 1)/2
2
Here, A and B are real, symmetric and positive-definite matrices. By dif-

ferentiating (5.55) with respect to vector c and setting the result to null, we
obtain
dE
= 2 (αA + βB) c − 2αp = 0 (5.56)
dc
which leads to β −1
c= A+ B p (5.57)
α
This is the optimal least-squares solution which minimizes the quadratic
measure in (5.55).
5.5.2 Eigenfilter Method

Notice that in (5.55) Es = cT Bc is a quadratic form, but Ep = cT Ac −
2pT c+q is not. To express Ep in a quadratic form, note that the zero-frequency
response of the filter in (5.13) is given by

H(ej0 ) = 1 1 · · · 1 c = 1T c (5.58)
where vector 1 is defined as 1 = [1 1 · · · 1]T . Therefore, the quantity (1 −

φ(ω))T c represents the deviation of the frequency response H(ejω ) from
the zero-frequency response H(ej0 ). As a result, the error measure for the
passband can be expressed as a quadratic form, that is,
Ep = cT Dc (5.59)
where ωp T
D= 1 − φ(ω) 1 − φ(ω) dω
0
5.6 Analytical Approach 117
Hence (5.55) can be written as
E = αEp + βEs = cT (αD + βB) c (5.60)
which is a quadratic form. Since αD+βB is a symmetric and positive-definite

matrix,
cT (αD + βB) c
λmin ≤ ≤ λmax (5.61)
||c||2
always holds for any (N + 1)/2 × 1 vector c where λmin (λmax ) denotes the
minimum (maximum) eigenvalue of αD + βB.
Summarizing the optimal least-squares solution that minimizes E in
(5.60) with respect to vector c is given by the eigenvector c1 of αD + βB
corresponding to its minimum eigenvalue λmin , the minimum value of E is
described by Emin = λmin ||c1 ||2 .
5.6 Analytical Approach

5.6.1 General FIR Filter Design
The frequency transfer function of the filter in (5.2) is given by
N
−1
H(ejω ) = hi e−jωi (5.62)
i=0
Given a desired frequency response D(ejω ), designing an FIR filter amounts

to obtaining a total of N independent parameters hi ’s of (5.62) in such a way
that a quadratic measure between the designed filter’s magnitude response
H(ejω ) and the desired magnitude response D(ejω ) is minimized.
Suppose the frequency grids in the range 0 ≤ ω < 2π are defined by
2π
ωl = l for l = 0, 1, · · · , M − 1 (5.63)
M
then a quadratic error measure can be written as
E = (W h − d)H (W h − d)
(5.64)
= hT W H W h − 2hT Re W H d + dH d
where
⎡ 2π 2π 2π
⎤
e−j0 M 0 e−j0 M 1 ··· e−j0 M (N −1)
⎢ 2π 2π ⎥2π
⎢ e−j1 M 0 e−j1 M 1 ··· e−j1 M (N −1)
⎥
⎢
W =⎢ ⎥
.. .. .. ⎥ ..
⎣ . . . ⎦ .
2π 2π 2π
−j(M −1) M 0 −j(M −1) M 1 −j(M −1) M (N −1)
e e ··· e
h = (h0 h1 · · · hN −1 )T
2π 2π 2π
T
d = D(ej0 M ) D(ej1 M ) · · · D(ej(M −1) M )
W H (dH ) denotes the conjugate transpose of matrix W (d), and Re[ · ] stands
for the real part of [ · ]. By differentiating (5.64) with respect to vector h and
setting the result to null, we obtain
dE
= 2W H W h − 2Re W H d = 0 (5.65)
dh
which leads to
1
h= Re W H d (5.66)
M
where W H W = M I N because
M
−1
2π
M for p = q
(p, q)th element of W H W = ejk M (p−q) =
k=0 0 for p = q
The expression in (5.66) provides the optimal least-squares solution which

minimizes the quadratic measure in (5.64). Note that, due to the orthogonality
of W , the formula in (5.66) does not require matrix inversion.
5.6.2 Linear-Phase FIR Filter Design

The magnitude response of the filter in (5.13) is described by
N −1
2
M (ω) = |H(ejω )| = ck cos(kω) (5.67)

k=0
Given a desired magnitude response |D(ejω )|, the problem of designing an

FIR filter is to obtain a total of (N +1)/2 independent parameters ck ’s of (5.67)
5.6 Analytical Approach 119
that minimize a quadratic measure between the designed filter’s magnitude

response M (ω) and the desired magnitude response |D(ejω )| over 0 ≤ ω ≤ π.
Suppose the frequency grids in the range 0 ≤ ω ≤ π are defined by
π
ωl = l for l = 0, 1, · · · , M (5.68)
M
with M > (N − 1)/2, then a quadratic error measure can be written as
E = (V c − d)T (V c − d)
(5.69)
= cT V T V c − 2cT V T d + dT d
where
⎡ π π π N −1
⎤
cos(0 M 0) cos(0 M 1) ··· cos(0 M 2 )
⎢ cos(1 π 0) π
cos(1 M 1) ··· π N −1 ⎥
cos(1 M
⎢ M 2 )⎥
V =⎢ .. .. .. .. ⎥
⎣ . . . . ⎦
π π π N −1
cos(M M 0) cos(M M 1) · · · cos(M M 2 )
T
c = c0 c1 · · · c N −1
2
π π π
T
d= |D(ej0 M )| |D(ej1 M )| · · · |D(ejM M )|
By differentiating (5.69) with respect to c and setting the result to null, we

obtain
dE
= 2V T V c − 2V T d = 0 (5.70)
dc
which leads to −1 T
c = V TV V d (5.71)
We now define a matrix R = [Rij ] for i, j = 0, 1, · · · , (N − 1)/2 as
R = V TV (5.72)
Obviously, matrix R is symmetric whose (i, j)th element is given by
M
π π
Rij = cos k i cos k j (5.73)
M M
k=0
In addition, because M > (N − 1)/2, R is nonsingular, hence R−1 exists

and is also symmetric. By substituting (5.72) into (5.71), we obtain
c = R−1 V T d (5.74)
N −1
Table 5.1 R−1 = [λij ] for 0 ≤ i, j ≤ N and N < M with N =
2

i, j λij (N odd) λij (N even)

M +N −1 M +N
i=j=0
M (M + N ) M (M + N + 1)
i = 0 and j even, 2 2
− −
or j = 0 and i even M (M + N ) M (M + N + 1)

2(M + N − 1) 2(M + N − 2)
i = j = 0 and i, j odd
M (M + N + 1) M (M + N )

2(M + N − 2) 2(M + N − 1)
i = j = 0 and i, j even
M (M + N ) M (M + N + 1)
4 4
i = j and i, j odd − −
M (M + N + 1) M (M + N )
4 4
i = j and i, j even − −
M (M + N ) M (M + N + 1)
(i + j) odd 0 0
The elements of R−1 = {λij | 0 ≤ i, j ≤ (N − 1)/2} can be found in

Table 5.1 [6].
5.7 Chebyshev Approximation

5.7.1 The Parks-McClellan Algorithm
From (5.13), the frequency response H(ejω ) of a causal linear-phase FIR filter
of even order N − 1 is described by (Type 1)
N −1
H(ejω ) = M (ω)e−j 2
ω
(5.75)
where
N −1
2
M (ω) = ck cos(kω)
k=0
Here, the amplitude response M (ω) of the filter is a real function of frequency
ω. Given a desired amplitude response D(ω), the weighted error function is
defined as
5.7 Chebyshev Approximation 121

ε(ω) = W (ω) M (ω) − D(ω) (5.76)
where W (ω) is a positive weighting function which controls the relative size
of the peak error in the specified frequency band. The problem of designing
an FIR digital filter, in this case, is to iteratively adjust the coefficients ci ’s
of the amplitude response M (ω) so that the peak absolute value of ε(ω) is
minimized.
Suppose the minimum of the peak absolute value of ε(ω) in a band ωa ≤
ω ≤ ωb is εo , then the absolute value satisfies
|ε(ω)| = |W (ω)| |M (ω) − D(ω)| ≤ εo for ωa ≤ ω ≤ ωb (5.77)
Typically, the desired amplitude response is specified by
1, in passband
D(ω) = (5.78)
0, in stopband
and it is also required that the amplitude response M (ω) satisfies the above
desired response with a ripple of ±δp in the passband and a ripple of δs in
the stopband. Hence, from (5.76) the weighting function can be chosen as
either of
1, in passband
W (ω) = (5.79)
δp /δs , in stopband
and
δs /δp , in passband
W (ω) = (5.80)
1, in stopband
The optimization problem encountered here is to determine the coefficients ci ’s
of M (ω) in (5.75) that minimize the peak absolute value ε of the weighted
approximation error ε(ω) of (5.76) over specified frequency bands R. As
will be shown below, this problem can be solved by applying the alternation
theorem from the theory of Chebyshev approximation [3].
5.7.2 Alternation Theorem

The alternation theorem [3] can be stated as follows: the amplitude response
M (ω) in (5.75) obtained by minimizing the peak absolute value ε of ε(ω) in
(5.76) is the optimal unique approximation of the desired amplitude response if
and only if there are at least (N +3)/2 extremal frequencies ω0 , ω1 , · · · , ω N +1
2
in a closed subset R of the frequency range 0 ≤ ω ≤ π such that
ω0 < ω1 < · · · < ω N −1 < ω N +1 and ε(ωi ) = −ε(ωi+1 ) with |ε(ωi )| = ε

2 2
for all i in the range 0 ≤ i ≤ (N + 1)/2.
When the approximation error ε(ω) for amplitude response M (ω) satisfies
the condition of the above theorem, the peaks of ε(ω) occur at ω = ωi for
0 ≤ i ≤ (N + 1)/2 in which
dε(ω)
=0 (5.81)
dω
Since W (ω) and D(ω) are piecewise constant in the passband and the
stopband, from (5.76) it follows that
% %
dε(ω) %% dM (ω) %%
= =0 (5.82)
dω %ω=ωi dω %ω=ωi
which implies that the magnitude response M (ω) also has peaks at ω = ωi .
The Chebyshev polynomials of first kind is defined as
Tn (x) = cos(nω) with x = cos(ω) (5.83)
hence
T0 (x) = 1 because cos(0 ω) = 1
T1 (x) = x because cos(1ω) = cos(ω)
T2 (x) = 2x2 − 1 because cos(2ω) = 2 cos2 (ω) − 1
T3 (x) = 4x3 − 3x because cos(3ω) = 4 cos3 (ω) − 3 cos(ω)
..
.
Tn+1 (x) = 2xTn (x) − Tn−1 (x) for n = 1, 2, · · · in general
(5.84)
The amplitude response M (ω) in (5.75) can be expressed as a power series in
cos(ω), i.e.,
N −1
2
M (ω) = αk cosk (ω) (5.85)

k=0
This equation is a polynomial of order (N − 1)/2 in cos(ω), hence M (ω)

can have at most (N − 3)/2 local minima inside the specified passband and
5.7 Chebyshev Approximation 123
stopband. Also, note that |ε(ω)| is a maximum at the band edges ω = ωp and
ω = ωs and hence, M (ω) has extrema at these frequencies. Moreover, M (ω)
may also have extrema at ω = 0 and ω = π. As a result, there exist at most
(N + 3)/2 extremal frequencies.
To obtain the optimal solution for the unknown ck ’s and ε under the
assumption that the (N + 3)/2 extremal frequencies are known, we need
to solve the set of (N + 3)/2 equations
W (ωi ) [M (ωi ) − D(ωi )] = (−1)i ε for 0 ≤ i ≤ (N + 1)/2 (5.86)
⎡ ⎤
1 cos(ω0 ) ··· cos( N 2−1 ω0 ) −1
W (ω0 )
⎢ ⎥⎡ ⎤
⎢1 cos(ω1 ) ··· cos( N 2−1 ω1 ) 1
W (ω1 ) ⎥ c0
⎢ ⎥
⎢ .. .. .. .. .. ⎥ ⎢ c1 ⎥
⎢. . . . . ⎥⎢ ⎥
⎢ N −1 N −3 ⎥ ⎢ .. ⎥
⎢ ⎥ ⎢ ⎥
⎢ 1 cos ω N −1 · · · cos 2 ω N −1 2 ⎥ ⎢ . ⎥
(−1)
⎢ 2 2 W ω N −1 ⎥ ⎣c N −1 ⎦
⎢ ⎥
⎢ N −1 ⎥
2
ε
2
⎣ N +1 (−1) ⎦
1 cos ω N +1 · · · cos 2 ω N +1 2

2 2 W ω N +1 (5.87)
2
⎡ ⎤
D(ω0 )
⎢ D(ω1 ) ⎥
⎢ ⎥
⎢ ⎥
⎢ .
.. ⎥
=⎢ ⎥
⎢ ⎥
⎢D ω N −1 ⎥
⎣ 2 ⎦
D ω N +1
2
The above simultaneous equations can be solved in principle for the unknown
parameters provided that the locations of the (N + 3)/2 extremal frequencies
are known a priori. This problem is resolved by the Remez exchange algorithm
outlined below.
The Remez Exchange Algorithm:

This algorithm is a very efficient iterative procedure for determining the
locations of the extremal frequencies, and is composed of the following steps.
Step 1: Choose a set of initial values for the extremal frequencies, or use the
values available from the termination of the previous iteration.
Step 2: Compute the value ε by solving (5.87).

Step 3: Compute the values of the amplitude response M (ω) at ω = ωi using
(−1)i ε
M (ωi ) = + D(ωi ) for 0 ≤ i ≤ (N + 1)/2 (5.88)
W (ωi )
Step 4: Determine the polynomial M (ω) by interpolating the above values at

the (N + 3)/2 extremal frequencies using the Lagrange interpolation
formula
N +1

2
M (ω) = M (ωk )Pk [cos(ω)] (5.89)

k=0
where
N +1
2
cos(ω) − cos(ωl )
Pk [cos(ω)] = Π for 0 ≤ k ≤ (N + 1)/2
l=0, l=k cos(ωk ) − cos(ωl )
Step 5: Compute the new weighted error function ε(ω) of (5.76) at a dense
set S of frequencies where S >> (N − 1)/2 and the transition band
is excluded. Setting S 16(N − 1) is adequate in practice.
Step 6: Determine the (N + 3)/2 new extremal frequencies from the values
of ε(ω) evaluated at the dense set of frequencies.
Step 7: Stop if the peak values ε are approximately equal. Otherwise, go back
to Step 2.
The above arguments can be applied to the other type of linear-phase FIR
digital filtes (Type 2, Type 3, and Type 4) with the slight modifications of the
algorithm [3].
5.8 Cascaded Lattice Realization of FIR Digital Filters

For the cascaded lattice realization, an (N − 1)th-order FIR transfer function
is assumed to be of the form
HN (z) = h0 + h1 z −1 + · · · + hN −1 z −(N −1) (5.90)
which is related to
H̃N (z) = hN −1 + hN −2 z −1 + · · · + h0 z −(N −1)
(5.91)
= z −(N −1) HN (z −1 )
5.8 Cascaded Lattice Realization of FIR Digital Filters 125

hN −1 HN (z) − h0 H̃N (z) = z −1 h1 + h2 z −1 + · · · + hN −1 z −(N −2)
= z −1 HN −1 (z)
hN −1 H̃N (z) − h0 HN (z) = hN −1 + hN −2 z −1 + · · · + h1 z −(N −2)
= H̃N −1 (z)
(5.92)
! "! " ! "
hN −1 −h0 HN (z) z −1 HN −1 (z)
= (5.93)
−h0 hN −1 H̃N (z) H̃N −1 (z)
where ⎡ ⎤ ⎡ ⎤
h1h1 hN −1 − h0 hN −2
⎢ ⎥ ⎢h h
h2 ⎥
⎢ ⎥ ⎢ 2 N −1 − h0 hN −3 ⎥
⎢ ⎥=⎢ ⎥
⎢ ⎥ ⎢
.. .. ⎥
⎣ ⎦ ⎣. . ⎦

hN −1 2
hN −1 − h02
Assuming that hN =1 = ±h0 , (5.93) can be expressed as

! " ! "! "! "
HN (z) 1 k0 z −1 0 HN −1 (z)
= Δ0 (5.94)
H̃N (z) k0 1 0 1 H̃N −1 (z)
where
h0 hN −1
k0 = , Δ0 =
hN −1 (hN −1 + h0 )(hN −1 − h0 )
A block diagram of the system in (5.94) is illustrated in Figure 5.10.
Similarly, from the transfer functions HN −1 (z) and H̃N −1 (z), we obtain
! " ! "! "! "
HN −1 (z) 1 k1 z −1 0 HN −2 (z)
= Δ1 (5.95)
H̃N −1 (z) k1 1 0 1 H̃N −2 (z)
0
HN-1(z) z -1 HN (z)
k0
k0
~ 0 ~
HN-1(z) HN ( z)
Figure 5.10 Normalized lattice structure of a section.
where
h1 hN −1
k1 = , Δ1 =
hN −1 (hN −1 + h1 )(hN −1 − h1 )
Eventually, we arrive at
! " ! "! "! "
H2 (z) 1 kN −2 z −1 0 H1 (z)
= ΔN −2
H̃2 (z) kN −2 1 0 1 H̃1 (z)
! "! " (5.96)
1 kN −2 z −1
= ΔN −2 ΔN −1
kN −2 1 1
where H1 (z) = H̃1 (z) = ΔN −1 . By substituting (5.95) into (5.94), we obtain

! " ! "! "! "! "! "
HN (z) 1 k0 z −1 0 1 k1 z −1 0 HN −2 (z)
= Δ0 Δ1
H̃N (z) k0 1 0 1 k1 1 0 1 H̃N −2 (z)
(5.97)
A block diagram of the system in (5.97) is shown in Figure 5.11. Moreover,
by making use of (5.96) and (5.97), we have
0 1
HN-2(z) z -1 z -1 HN (z)
k1 k0
k1 k0
0 1
~ ~
HN-2(z) HN ( z)
Figure 5.11 Normalized lattice structure of cascaded two sections.
5.8 Cascaded Lattice Realization of FIR Digital Filters 127
! " ! "! "! "! "
HN (z) 1 k0 z −1 0 1 k1 z −1 0
=Δ ···
H̃N (z) k0 1 0 1 k1 1 0 1
! " ! −1 " (5.98)
1 kN −2 z
kN −2 1 1
where Δ = Δ0 Δ1 · · · ΔN −1 . A block diagram of the system in (5.98) is

depicted in Figure 5.12.
Next, we consider a linear-phase FIR digital filter with even integer N ,
i.e., odd order N − 1. In this case, since hi = ±hN −1−i holds for i =
0, 1, · · · , N − 1, we can write
N
N −1 −1 N −1

2
−i
H(z) = hi z = hi z −i ± hN −1−i z −i
i=0 i=0 i= N (5.99)
2
N
= H N (z) ± z − 2 H̃ N (z)
2 2
where positive sign is for the symmetric case, while negative sign is for the
antisymmetric case and
N
H N (z) = h0 + h1 z −1 + · · · + h N −1 z −( 2 −1)
2 2
N
H̃ N (z) = h N −1 + h N −2 z −1 + · · · + h0 z −( 2 −1)
2 2 2
A block diagram of the system in (5.99) is drawn in Figure 5.13 where the
(N/2 − 1)th-order lattice structure is used and Δ = Δ0 Δ1 · · · ΔN/2−1 .
Input
z -1 z -1 z -1 HN (z)
kN - 2 k1 k0
kN - 2 k1 k0
~
HN ( z)
Figure 5.12 Cascaded lattice structure of an FIR digital filter.

Input
z -1 z -1
kN/2-2 k0 Output
kN/2-2 k0
z -N/2
Figure 5.13 The lattice structure of linear-phase FIR digital filters.
Notice that since h0 = ±hN −1 , h2N −1 − h20 = 0 holds, the (N − 1)th-order

lattice structure is not available in the linear-phase FIR digital filter.
5.9 Numerical Experiments

As a numerical example, Suppose that the desired frequency response of a
lowpass digital filter is specified by
⎧
⎨ e−j N2−1 ω for |ω| ≤ ωp
jω
D(e ) =
⎩ 0 for |ω| ≥ ωs
where the passband edge and the stopband edge are ωp = 0.3π and ωs =
0.35π, respectively, and the order of the filter is assumed to be N − 1 = 30.
5.9.1 Least-Squares Design

5.9.1.1 Quadratic measure minimization
When the weighting parameters in (5.54) were chosen as α = β = 1/2,
c = (c0 , c1 , · · · , c15 )T = (h15 , 2h14 , · · · , 2h0 )T was computed from (5.57)
yielding
⎡ ⎤ ⎡ ⎤
h0 h1 h2 h3 0.058420 0.154451 0.109441 −0.069545
⎢ h4 h5 h6 h7 ⎥ ⎢ −0.226197 −0.181544 0.078904 0.339349 ⎥
⎢ ⎥ = 10 ⎢
−1 ⎥
⎣ h8 h9 h10 h11 ⎦ ⎣ 0.311192 −0.085985 −0.565692 −0.620779 ⎦
h12 h13 h14 h15 0.090398 1.411959 2.704779 3.241437
The magnitude response of the resulting filter is shown in Figure 5.14.
5.9.1.2 Eigenfilter method

The same design problem was addressed using the eigenfilter method. In
this case, the eigenvector c1 = (c0 , c1 , · · · , c15 )T = (h15 , 2h14 , · · · , 2h0 )T
5.9 Numerical Experiments 129
jω
|H(e )|
jω
|H(e )| 0
−20
1
Gain, dB
−40
−60
0.5
−80
−100
0 0.2 0.4 0.6 0.8 1
0 0.5 1 ω/π ω/π
Figure 5.14 The magnitude response of the resulting filter.
jω
jω |H(e )|
|H(e )|
0
1 −20
Gain, dB
−40
0.5 −60
−80
−100
0 0.2 0.4 0.6 0.8 1
0 0.5 1 ω/π ω/π
corresponding to the minimum eigenvalue λmin = 2.392118 × 10−3 which

satisfies (5.61) was computed as
⎡ ⎤ ⎡ ⎤
h0 h1 h2 h3 0.048062 0.145306 0.101539 −0.077514
⎢ ⎥ ⎢ ⎥
⎢ h4 h5 h6 h7 ⎥ −1 ⎢ −0.235591 −0.192292 0.068380 0.330571 ⎥
κ⎢ ⎥ = 10 ⎢ ⎥
⎣ h8 h9 h10 h11 ⎦ ⎣ 0.303905 −0.093823 −0.575924 −0.632817 ⎦
h12 h13 h14 h15 0.079784 1.406269 2.704706 3.238881
where κ = D(1)/1T c1 so that D(1) = H(1) = 1. The magnitude response

of the resulting filter is shown in Figure 5.15.
5.9.2 Analytical Approach

5.9.2.1 General FIR filter design
Consider designing a lowpass digital filter that approximates the desired
frequency response
⎧ N −1
⎪
⎪ e−j 2 ω for 0 ≤ ω ≤ 0.3π
⎪
⎪
⎪
⎪
⎪
⎪ 20 N −1
⎪
⎪ (− ω + 7) e−j 2 ω for 0.3π < ω < 0.35π
⎪
⎨ π
jω
D(e ) = 0 for 0.35π ≤ ω| ≤ 1.65π
⎪
⎪
⎪
⎪ 20
⎪
⎪ (
N −1
ω − 33) e−j 2 ω for 1.65π < ω < 1.7π
⎪
⎪ π
⎪
⎪
⎪
⎩ N −1
e−j 2 ω for 1.7π ≤ ω < 2π
and M = 200 was chosen in (5.63). The coefficient vector h =

(h0 , h1 , · · · , h30 )T was computed from (5.66) as
⎡ ⎤ ⎡ ⎤
h0 h1 h2 h3 0.064876 0.184934 0.134657 −0.071201
⎢ h4 h5 h6 h7 ⎥ ⎢ −0.250137 −0.204317 0.076366 0.355871 ⎥
⎢ ⎥ ⎢ ⎥
⎢h h9 h10 h11 ⎥ ⎢ 0.329948 −0.080191 −0.574339 −0.634091 ⎥
⎢ 8 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢h12 h13 h14 h15 ⎥ −1 ⎢ 0.082541 1.412721 2.711472 3.250000 ⎥
⎢ ⎥ = 10 ⎢ ⎥
⎢h16 h17 h18 h19 ⎥ ⎢ 2.711472 1.412721 0.082541 −0.634091 ⎥
⎢ ⎥ ⎢ ⎥
⎢h20 h21 h22 h23 ⎥ ⎢ −0.574339 −0.080191 0.329948 0.355871 ⎥
⎢ ⎥ ⎢ ⎥
⎣h24 h25 h26 h27 ⎦ ⎣ 0.076366 −0.204317 −0.250137 −0.071201 ⎦
h28 h29 h30 0.134657 0.184934 0.064876
The magnitude response of the resulting filter is drawn in Figure 5.16.
5.9.2.2 Linear-Phase FIR filter design

We now consider designing a lowpass digital filter that approximates the
desired frequency response
jω
|H(e )|
jω
|H(e )| 0
−20
1
Gain, dB
−40
−60
0.5
−80
−100
0 0.2 0.4 0.6 0.8 1
0 0.5 1 ω/π ω/π

⎧ N −1
⎪
⎪ e−j 2
ω
for 0 ≤ ω ≤ 0.3π
⎪
⎨
D(ejω ) = 20 N −1
⎪ (− ω + 7) e−j 2 ω for 0.3π < ω < 0.35π

⎪
⎪ π
⎩
0 for 0.35π ≤ ω ≤ π
and M = 100 was chosen in (5.68). The vector c = (c0 , c1 , · · · , c15 )T =

(h15 , 2h14 , · · · , 2h0 )T was computed from (5.74) as
⎡ ⎤ ⎡ ⎤
h0 h1 h2 h3 0.063576 0.183391 0.133357 −0.072744
⎢ h4 h5 h6 h7 ⎥ ⎢ 0.354328 ⎥
⎢ ⎥ = 10−1 ⎢ −0.251437 −0.205860 0.075066 ⎥
⎣ h8 h9 h10 h11 ⎦ ⎣ 0.328648 −0.081734 −0.575639 −0.635634 ⎦
h12 h13 h14 h15 0.081241 1.411178 2.710172 3.248457
5.9.3 Chebyshev Approximation

The FIR digital filter was described by (5.75) with N = 31. The set of
frequencies S in Step 5 of the Remez exchange algorithm was chosen to be
S = {ωi ∈ Ω | 0 ≤ ωi ≤ ωp } ∪ {ωi ∈ Ω | ωs ≤ ωi ≤ π}
where Ω = {ωi = (π/500)i | i = 0, 1, · · · , 500}. With ωp = 0.3π and

ωs = 0.35π, we have
{ωi ∈ Ω | 0 ≤ ωi ≤ ωp } = {ωi | i = 0, 1, · · · , 150}

{ωi ∈ Ω | ωs ≤ ωi ≤ π} = {ωi | i = 175, 176, · · · , 500}
jω
|H(e )|
jω
|H(e )| 0
−20
1
Gain, dB
−40
−60
0.5
−80
−100
0 0.2 0.4 0.6 0.8 1
0 0.5 1 ω/π ω/π

We now denote the elements of set S as S = {ωi | i = 0, 1, · · · , 476} where
(π/500)i for i = 0, 1, · · · , 150

ωi =
(π/500)(i + 24) for i = 151, 152, · · · , 476
In the Remez exchange algorithm, the initial values for the extremal
frequencies were chosen to be

ωround{(N s −1)/16}r
for r = 0, 1, · · · , 16
where Ns = 477 is the number of elements in set S and (N + 1)/2 = 16. In

other words, the extremal frequencies were expressed in terms of the ωi ’s as
! "
ω0 ω30 ω60 ω89 ω119 ω149 ω203 ω232 ω262
ω292 ω322 ω351 ω381 ω411 ω441 ω470 ω500
Then vector c = (c0 , c1 , · · · , c15 )T = (h15 , 2h14 , · · · , 2h0 )T was computed

from (5.87) and we obtained
⎡ ⎤ ⎡ ⎤
h0 h1 h2 h3 −0.034827 0.601336 0.158336 −0.069666
⎢ h4 h5 h6 h7 ⎥⎥ ⎢
−1 ⎢ −0.248594 −0.203219 0.075809 0.353959 ⎥
⎢ = 10 ⎣ ⎥
⎣ h8 h9 h10 h11 ⎦ 0.328456 −0.079881 −0.572778 −0.632930 ⎦
h12 h13 h14 h15 0.082451 1.412009 2.711147 3.250054
5.9.4 Comparison of Algorithms’ Performances

In order to compare their performances, the design results obtained above are
summarized in Table 5.2 where
jω
|H(e )|
jω
|H(e )| 0
−20
1
Gain, dB
−40
−60
0.5
−80
−100
0 0.2 0.4 0.6 0.8 1
0 0.5 1 ω/π ω/π

5.10 Summary 133
Table 5.2 Performance comparisons among algorithms

Max. Negative Ripple
Algorithms ε2 ε∞ on Range 0 ≤ i ≤ 1000
Quadratic Measure Minimization 5.170751 20.051857 –0.045462
Eigenfilter Method 5.246651 19.864566 –0.046884
General FIR Filter Design 5.349644 17.245433 –0.059975
Linear-Phase FIR Filter Design 5.345929 17.273949 –0.060041
Chebyshev Approximation 12.731666 10.142531 –0.101418
&'
1000 2
|D(ejωi )| − |H(ejωi )|
i=0
ε2 = &' × 100
1000 jωi )|2
i=0 |D(e
% %
max % |D(ejωi )| − |H(ejωi )| %
0≤i≤1000
ε∞ = × 100
max |D(ejωi )|
0≤i≤1000
where ωi = πi/1000 for i = 0, 1, 2, · · · , 1000 and set D(ejωi ) = H(ejωi ) =

0 for ωi in the transition band 0.3π < ωi < 0.35π.
5.10 Summary
This chapter has shown that exact linear-phase responses can be achieved by
imposing either symmetric or antisymmetric condition on the FIR filter’s coef-
ficients. Several window functions have been introduced with their application
to FIR digital filter design. An approach for designing least squares linear-
phase FIR digital filters that minimize a quadratic measure has been studied.An
eigenfilter method for designing least squares linear-phase FIR digital filters
has been presented in which an eigenvector corresponding to the minimum
eigenvalue of a symmetric positive-definite matrix has been computed to
obtain the optimal solution. A closed-form least square solution to the problem
of analytically designing linear-phase FIR digital filters has been given. The
Parks-McClellan algorithm based on the minimax optimality criterion has
been reviewed. A method for realizing FIR digital filters by cascaded lattice
forms has also been examined. Finally, performance comparisons among these
algorithms have been performed through a numerical example.
References
[1] A. V. Oppenheim and R. W. Schafer, Digital Signal Processing,
NJ: Prentice-Hall, 1975.
[2] A. Antoniou, Digital Filters, 2nd ed. NJ: McGraw-Hill, 1993.
[3] S. K. Mitra, Digital Signal Processing, 3rd ed. NJ: McGraw-Hill, 2006.
[4] S. Takahashi and M. Ikehara, Digital Filters, Tokyo, Japan, Baifukan,
1999.
[5] P. P. Vaidyanathan and T. Q. Nguyen, “Eigenfilters: A new approach to
least-squares FIR filter design and applications including Nyquist filters,”
IEEE Trans. Circuits Syst., vol. CAS-34, no. 1, pp. 11–23, Jan. 1987.
[6] M. O. Ahmad and J.-D. Wang, “An analytical least square solution to
the design problem of two-dimensional FIR filters with quadrantally
symmetric or antisymmetric frequency response,” IEEE Trans. Circuits
Syst., vol. 36, no. 7, pp. 968–979, July 1989.
[7] T. W. Parks and J. H. McClellan, “Chebyshev approximation for non-
recursive digital filters with linear phase,” IEEE Trans. Circuits Theory,
vol. CT-19, no. 2, pp. 189–194, Mar. 1972.
Shokodo, 1986.
[9] M. Hagiwara, Digital Signal Processing, Tokyo, Japan, Morikita Pub-
lishing, 2001.
6
Design Methods Using Analog Filter Theory
6.1 Preview
One of the approaches to the design of an IIR digital filter is to use analog
filter theory in conjunction with bilinear transformation that maps frequencies
in the analog domain to the digital domain. This indirect design method works
well, especially for the design of standard IIR digital filters such as lowpass,
highpass, bandpass, and bandstop filters. This chapter starts by a brief review
of several design techniques for analog filters. These include designs based
on Butterworth, Chebyshev, inverse-Chebyshev, and elliptic approximations
as well as analog-filter approximations by transformations which transform
normalized lowpass analog filters to denormalized lowpass, highpass, band-
pass, and bandstop analog filters. The bilinear transformation method for the
design of IIR digital filters is then studied in detail and illustrated by a design
example.
6.2 Design Methods Using Analog Filter Theory

Standard IIR digital filters such as lowpass, highpass, bandpass, and
bandstop filters can be designed through indirect methods in which
a continuous-time transfer function satisfying certain specifications is
obtained by a standard analog-filter approximation, and then a corres-
ponding discrete-time transfer function is obtained by one of the following
methods: invariant-impulse-response method and its variants, matched-z
transformation method, and bilinear transformation method [1]. Loss function
is a concept often involved in the study of analog filters. A loss function L(–s2 )
is related its corresponding transfer function H(s) as
135
136 Design Methods Using Analog Filter Theory
D(s)D(−s) N (s)
L(−s2 ) = , H(s) =
N (s)N (−s) D(s)
where N (s) and D(s) are polynomials in s. To ensure the stability of the
analog approximation, the poles of H(s) (i.e. the zeros of polynomial D(s))
must lie strictly inside the left-half s-plane.
6.2.1 Lowpass Analog-Filter Approximations

6.2.1.1 Butterworth approximation
The transfer function of the nth-order normalized lowpass Butterworth filter
assumes the form
1
HN (s) = n
i=1 (s − pi )
where pi for i = 1, 2,· · · , n are the left-half s-plane zeros of the corresponding
loss function L(–s2 ), which are given by
2n

L(−s2 ) = 1 + (−s2 )n = (s − sk )
k=1
where
ej(2k−1)π/2n for even n
sk =
ej(k−1)π/n for odd n
The term “normalized” refers to the constraint that at s = jω with ω = 1,
L(1) = 2. As a result, the magnitude of √
the normalized Butterworth filter at
ω = 1 assumes the value |HN (j)| = 1 2 0.707, namely a 3 dB loss
relative to the filter gain at ω = 0.
6.2.1.2 Chebyshev approximation

The magnitude response of the lowpass Butterworth filter is a monotonically
increasing function of ω. A more balanced characteristic may be achieved
using the Chebyshev approximation where the magnitude response in pass-
band oscillates between one and a less-than-one value 10−0.05Ap which means
an Ap dB loss in passband.
The normalized transfer function HN (s) of nth-order lowpass Chebyshev
filter is given by [1]
H0
HN (s) = r
D0 (s) i=1 (s − pi )(s − p∗i )
6.2 Design Methods Using Analog Filter Theory 137
where

(n − 1)/2 for odd n s − 1/p0 for odd n
r= and D0 (s) =
n/2 for even n 1 for even
and constant H0 and poles pi are calculated for a given Ap > 0 (in dB) as
follows:
1/2
ε = 100.1Ap − 1

1 1
p0 = σ(n+1)/2 with σ(n+1)/2 = − sinh sinh−1
n ε
pi = σi + jωi for i = 1, 2, . . . , r

1 1 (2i − 1)π
σi = − sinh sinh−1 sin
n ε 2n

1 1 (2i − 1)π
ωi = cosh sinh−1 cos
n ε 2n
r 2
−p0 i=1 |pi | for odd n
H0 =
10−0.05Ap ri=1 |pi |2 for even n
6.2.1.3 Inverse-Chebyshev approximation

The inverse-Chebyshev filters are closely related to the Chebyshev filters,
whose magnitude response is a monotonically decreasing function of ω in the
passband and oscillates between zero and a prescribed minimum attenuation
0.1A −1/2
10 a − 1 (with Aa > 0 in dB) in the stopband.
The normalized transfer function of the nth-order lowpass inverse-
Chebyshev filter is given by [1]
r
H0 (s − 1/zi )(s − 1/zi∗ )
HN (s) =
D0 (s) (s − 1/pi )(s − 1/p∗i )
i=1
where

(n − 1)/2 for odd n s − 1/p0 for odd n
r= and D0 (s) =
n/2 for even n 1 for even n
and constant H0 , zeros zi , and poles pi are calculated for a given Aa > 0 (in
dB) as follows:
−1/2
δ = 100.1Aa − 1
zi = j cos (2i−1)π
2n for i = 1, 2, · · · , r
1
p0 = σ(n+1)/2 with σ(n+1)/2 = − sinh n sinh−1 1
δ
pi = σi + jωi for i = 1, 2, · · · , r
1
σi = − sinh n sinh−1 1
δ sin (2i−1)π
2n
1
ωi = cosh n sinh−1 1
δ cos (2i−1)π
2n
⎧ r |zi |2
⎪ 1
⎪ for odd n
⎨ −p0 i=1 |pi |2
H0 =
⎪
⎩ r
⎪ |zi |2
for even n
i=1 |pi |2
6.2.1.4 Elliptic approximation

Elliptic filters are a class of analog filters that are more efficient than the
Butterworth, Chebyshev and inverse-Chebyshev filters, in which the mag-
nitude response oscillates between one and a maximum passband loss in
passband and oscillates between zero and a minimum stopband attenuation in
stopband.
Given a selectivity factor k > 0, a maximum passband loss of Ap dB
and a minimum stopband attenuation of Aa dB, the transfer √ function of a
normalized lowpass
√ elliptic filter with passband edge ωp = k and stopband
edge ωa = 1/ k assumes the form
r
H0 s2 + a0i
HN (s) =
D0 (s) s2 + b1i s + b0i
i=1
where

(n − 1)/2 for odd n s + σ0 for odd n
r= and D0 (s) =
n/2 for even n 1 for even n
and constant H0 and transfer-function coefficients can be evaluated using the

following formulas [1]:
√ √
1 1−√k
k = 1 − k2 , q0 = 2 1+ k
q = q0 + 2q05 + 15q09 + 150q013

100.1Aa − 1 log 16D
D= , n≥
100.1Ap − 1 log(1/q)
1 100.05Ap + 1
Λ= ln 0.05Ap
2n 10 −1

2q 1/4 ∞ (−1)m q m(m+1) sinh [(2m + 1)Λ]
m=0
σ0 =
1+2 ∞ m=1 (−1)m q m2 cosh 2mΛ

σ2

W= 1 + kσ02 1 + k0
∞ m m(m+1) sin (2m+1)πμ
2q 1/4 m=0 (−1) q n
Ωi = ∞
1+ 2 m=1 (−1)m q m2 cos 2mπμ
n
where
⎧
⎨ i for odd n
μ= i = 1, 2, · · · , r
⎩ i − 1 for even n
2

Ω2 1
Vi = 1 − kΩ2i 1− i , a0i =
k Ω2i
(σ0 Vi )2 + (Ωi W )2 2σ0 Vi
b0i = 2 , b1i =
1 + σ02 Ω2i 1 + σ02 Ω2i
⎧
⎪
r b
0i
⎪
⎨ σ0 for odd n
a
i=1 0i
H0 = r b
⎪
⎪ 0i
⎩ 10−0.05Ap for even n
a
i=1 0i
The actual minimum stopband attenuation is given by

0.1Ap
10 −1
Aa = 10 log +1
16q n
The series involved in calculating σ0 and Ωi converge rapidly, and 3 or 4
terms are sufficient for most designs.
6.2.2 Other Analog-Filter Approximations by Transformations

Denormalized lowpass, highpass, bandpass, and bandstop approximations can
be deduced from normalized lowpass approximations using transformations
of the form s = f (s̄) [1]. In what follows, HN (s) denotes the transfer
function of a normalized lowpass analog filter with stopband and passband
edges ωp and ωa , respectively.
6.2.2.1 Lowpass-to-lowpass transformation

The transformation
s = λs̄ (6.1)
maps the ranges [0, jωp ] and [jωa , j∞) onto the ranges [0, jωp /λ] and
[jωa /λ, j∞), respectively. Hence
HLP(s̄) = HN (s)|s=λs̄
is a denormalized lowpass approximation with passband edge ωp /λ and

stopband edge ωa /λ.
6.2.2.2 Lowpass-to-highpass transformation

The transformation
λ
s= (6.2)
s̄
maps the ranges [0, jωp ] and [jωa , j∞) onto the ranges −j∞, −jλ/ωp ]
and [−jλ/ωa , 0], respectively. Hence
HHP(s̄) = HN (s)|s=λ/s̄
is a denormalized highpass approximation with stopband edge λ/ωa and

passband edge λ/ωp .
6.2.2.3 Lowpass-to-bandpass transformation

A transformation that converts a normalized lowpass approximation HN (s)
to a bandpass approximation is given by

1 ω2
s= s̄ + 0 (6.3)
B s̄
where B and ω0 are constants. The passband and stopband edges of the
transformed bandpass filter are given by
2
ωp B ωp B
ω̄p1 , ω̄p2 = ∓ + ω02 +
2 2
2
ωa B ωa B
ω̄a1 , ω̄a2 = ∓ + ω02 +
2 2
6.2.2.4 Lowpass-to-bandstop transformation

A transformation that converts a normalized lowpass approximation HN (s)
to a bandstop approximation is given by
Bs̄
s= (6.4)
s̄2 + ω02
where B and ω0 are constants. The passband and stopband edges of the
transformed bandpass filter are given by
2
B B
ω̄p1 , ω̄p2 = ∓ 2ωp + ω02 + 2ωp
2
B B
ω̄a1 , ω̄a2 = ∓ 2ωa
+ ω02 + 2ωa
6.2.3 Design Methods Based on Analog Filter Theory

In order for an IIR digital filter to be realizable, the transfer function of an IIR
filter must be a rational function of z with the degree of numerator polynomial
equal to or less than that of the denominator polynomial, and its poles must lie
within the unit circle of the z plane. These conditions are called the realizability
constraints [1].
6.2.3.1 Invariant impulse-response method

Let HA (s) = N (s)/D(s) be an analog IIR filter whose impulse response is
denoted by hA (t). The digital IIR filter designed by the invariant impulse-
response method requires that the impulse response hD (k) of the digital filter
exactly equals to equally spaced samples of the impulse response hA (t) [2],
namely
hD (k) = hA (t)|t=kT = hA (kT ) (6.5)

The transfer function of the digital filter can be expressed in terms of its
impulse response as
∞
HD (z) = hD (k)z −k (6.6)
k=0
and HA (s) can be expanded in terms of partial fractions as

n
Ki
HA (s) = (6.7)
s − pi
i=1
where pi for i = 1, 2, · · · , n are the poles of HA (s). Consequently, we have

n

hA (t) = Ki epi t
i=1
and
n
n
k
hA (kT ) = Ki epi kT = Ki epi T
i=1 i=1
which in conjunction with (6.5) leads (6.6) to an IIR transfer function
∞
n n
p T k −n Ki z
HD (z) = Ki e i
z = (6.8)
z − epi T
k=0 i=1 i=1
It is noted that HD (z) in (6.8) also has n poles and that if the analog HA (s)
is stable, i.e., all pi have negative real parts, then the magnitudes of eP iT are
less than one, hence the digital HD (z) is also stable.
The design method may be summarized in three steps as follows:
1. Design a prototype analog filter with transfer function HA (s).
2. Obtain a partial fraction expansion of HA (s) as in (6.7).
3. Use the values of Ki and pi obtained from Step 2 to construct a digital
transfer function HD (z) using (6.8).
One may choose one of the methods in Sections 6.2.1 and 6.2.2 to implement
Step 1. To illustrate Steps 2 and 3 of the above method, consider the
transfer function of the sixth-order normalized lowpass Butterworth filter
given by [1]
1
HA (s) =
(s2 + 0.517638s + 1)(s2 + 1.414214s + 1)(s2 + 1.931852s + 1)
The partial fraction expansion of HA (s) is found to be

6
Ki
HA (s) =
s − pi
i=1
with
K1,2 = −1.523603, K3,4 = 0.204124 ± j0.353553
K5,6 = 1.319479 ∓ j2.285412, p1,2 = −0.707107 ± j0.707107
p3,4 = −0.258819 ± j0.965926, p5,6 = −0.965926 ± j0.258819
Now using (6.4) with T = 1, one obtains
N1 (z) N2 (z) N3 (z)
HD (z) = + +
D1 (z) D2 (z) D3 (z)
where
N1 (z) −3.047201z 2 + 1.143354z
= 2
D1 (z) z − 0.749706z + 0.243117
N2 (z) 0.408248z 2 − 0.628224z
= 2
D2 (z) z − 0.877962z + 0.595926
N3 (z) 2.638959z 2 − 0.525732z
= 2
D3 (z) z − 0.735906z + 0.144880
The largest magnitude of the poles of HD (z) is 0.771963, hence the IIR digital
filter is stable. The magnitude response of HD (z) is depicted in Figure 6.1.
6.2.3.2 Bilinear-transformation method

Basic Concepts and Properties
The bilinear-transformation method is a frequency-domain method to convert
an analog prototype filter into a desired digital filter. In this process, passbands
and stopbands in the analog filter are translated into the corresponding pass-
bands and stopbands of the digital filter with the stability, passband ripple, and
stopband attenuation preserved. As such, the bilinear transformation method
has been one of the most important methods for the design of IIR digital filters.
The bilinear transformation is a mapping that is linear in the numerator as
well as the denominator and is given by

1 z−1
s= (6.9)
T z+1
Figure 6.1 Magnitude response of the 6th-order IIR filter.
where T is the sampling interval in seconds. Application of the bilinear

transformation to an analog transfer function HA (s) leads to a digital transfer
function HD (z) with
HD (z) = HA (s)|s= 1 ( z−1 )
T z+1
The time-domain response of the digital filter so obtained is approximately the

same as that of the prototype analog filter, and the two time-domain responses
get closer as T gets smaller [1]. From (6.9), it follows that
T /2 + s
z=
T /2 − s
which with s = σ + jω and z = rejθ gives
2 1/2
2
T +σ + ω2
r= 2 (6.10a)
2
T −σ + ω2
and
ω ω
θ = tan−1 + tan−1 (6.10b)
T /2 + σ T /2 − σ
From (6.10a) and (6.10b), it is concluded that the bilinear transformation maps
the open right-half s plane onto the exterior to the unit circle of the z plane,
j axis of the s plane onto the unit circle of the z plane, and open left-half s
plane onto the interior of the unit circle of the z plane. In addition, if in (6.9)
we let s = jω and z = ejΩT , then we obtain a frequency interpretation of the
bilinear transformation that
2 ΩT
HD (ejΩT ) = HA (jω) if ω = tan (6.11)
T 2
Note that ω Ω if Ω ≤ 0.3/T . In other words, the digital filter has the
same frequency response as the prototype analog filter as long as Ω ≤ 0.3/T .
For high frequencies the relation between ω and Ω becomes nonlinear as can
be seen from (6.11). The distortion introduced by the bilinear transformation
in the frequency scale of the digital filter is known as the warping effect
[1]. As far as the amplitude response is concerned, the warping effect can be
eliminated by prewarping the analog filter as follows. Suppose ω1 , ω2 , · · · , ωL
are the passband and stopband edges of the analog filter. In order for the digital
filter to have passband and stopband edges Ω1 , Ω2 , · · · , ΩL , the analog filter
should be prewarped before application of the bilinear transformation so that
its passband and stopband edges ω1 , ω2 , · · · , ωL satisfy
2 Ωi T
ωi = tan for i = 1, 2, · · · , L (6.12)
T 2
Lowpass, highpass, bandpass, and bandstop digital IIR filters satisfying
prescribed specifications can be designed by first transforming a normali-
zed analog lowpass transfer function HN (s) into a denormalized lowpass,
highpass, bandpass, and bandstop transfer function (see Section 6.2.3.2):
HX (s̄) = HN (s)|s=fX (s̄) (6.13)
and then applying the bilinear transformation to HX (s̄) to obtain a digital

transfer function
HD (z) = HX (s̄)|s̄= 2 ( z−1 ) (6.14)
T z+1
It is of interest to note that the second step of the design can be carried out
with ease when the transfer function of the analog filter is given in terms of
its poles and zeros as [3]

m (a)
(s − zi )
HA (s) = H0 i=1 (a)
(6.15)
n
i=1 (s − pi )
The application of bilinear transformation to (6.15) yields
m
n−m i=1 (z − zi )
HD (z) = A(z + 1) n (6.16a)
i=1 (z − pi )
where
T (a) T (a)
1+ 2 zi 1+ 2 pi
zi = , pi = (6.16b)
T (a) T (a)
1− 2 zi 1− 2 pi
and m 2 (a)

i=1 T − zi
A= (6.16c)
n 2 (a)
i=1 T − pi
Design Procedure
To describe the design procedure in detail, let the design specifications of a
digital IIR filter be given in terms of passband edge Ωp (Ωp1 and Ωp2 for
bandpass and bandstop filters) (rad/s), stopband edge Ωa (Ωa1 and Ωa2
for bandpass and bandstop filters) (rad/s), sampling frequency ωs (rad/s),
maximum passband loss Ap (dB), and minimum stopband attenuation Aa
(dB). The sampling period T is evaluated as T = 2π/ωs .
1. For lowpass (LP) and highpass (HP) filters, compute parameter K0 as
tan(Ωp T /2)
K0 = (6.17)
tan(Ωa T /2)
For bandpass (BP) and bandstop (BS) filters, compute
Ωp2 T Ωp1 T Ωp1 T Ωp2 T
KA = tan 2 − tan 2 , KB = tan 2 tan 2
KA tan(Ωa1 T /2)
KC = tan Ωa1 T Ωa2 T
2 tan 2 , K1 =
KB − tan2 (Ωa1 T /2)
KA tan(Ωa2 T /2)
K2 = − KB
tan2 Ωa2 T /2
2. Determine n and ωp . For elliptic filters, also determine k.

• For Butterworth filters, first compute K as
⎧
⎪ K0 for LP
⎪
⎪
⎪
⎨ 1/K0 for HP
K=
⎪
⎪ K1 (if KC ≥ KB ) or K2 (if KC < KB ) for BP
⎪
⎪
⎩
1/K2 (if KC ≥ KB ) or 1/K1 (if KC < KB ) for BS
(6.18a)
then compute
100.1Aa − 1 log D 1/2n
D= , n≥ , ωp = 100.1Ap − 1
100.1Ap − 1 2 log(1/K)
(6.18b)
• For Chebyshev filters, first compute K using (6.18a), then compute
√
100.1Aa − 1 cosh−1 D
D = 0.1Ap , n≥ , ωp = 1 (6.19)
10 −1 cosh−1 (1/K)
• For elliptic filters, compute k using
⎧
⎪ K0 for LP
⎪
⎪
⎪
⎨ 1/K0 for HP
k=
⎪
⎪
⎪
⎩
1/K2 (if KC ≥ KB ) or 1/K1 (if KC < KB ) for BS
(6.20a)
then compute ωp using
⎧ √
⎪ K0 for LP
⎪
⎪
⎨ 1/√K0
⎪
for HP
ωp = √ √
⎪
⎪
⎪
⎩ √ √
1/ K2 (if KC ≥ KB ) or 1/ K1 (if KC < KB ) for BS
(6.20b)
and finally compute
√ √
1 1−√k
k = 1 − k2 , q0 = 2 1+ k
q = q0 + 2q05 + 15q09 + 150q013 (6.20c)

100.1Aa − 1 log 16D
D = 0.1Ap , n≥
10 −1 log (1/q)
3. Determine λ for LP and HP, or B and ω0 for BP and BS.

For LP and HP, compute λ as
⎧
⎪ ωp T
⎪
⎨ 2 tan(Ω T /2) for LP
p
λ= (6.21)
⎪
⎩ 2ωp tan(Ωp T /2) for HP
⎪
T
For BP, compute

ω0 = 2 KB /T, B = 2KA /T ωp (6.22)
For BS, compute

ω0 = 2 KB /T, B = 2KA ωp /T (6.23)
4. Form the normalized lowpass transfer function HN (s), see

Section 6.2.1.4.
5. Apply the analog-filter transformation in (6.13), see Section 6.2.2.
6. Apply the bilinear transformation in (6.14).
To illustrate the bilinear-transformation method, consider designing a digital
elliptic bandpass filter with specifications
Ap = 1 dB, Aa = 42 dB
Ωp1 = 1000 rad/s, Ωp2 = 1300 rad/s
Ωa1 = 900 rad/s, Ωa2 = 140 rad/s
ωs = 6000 rad/s
Following the above design steps, we compute
k = 0.609806, ωp = 0.780900, k = 0.792551
q0 = 0.029030, q = 0.029030, D = 6.120655 × 104
n = 4, ω0 = 1.305887 × 103 , B = 5.684665 × 102
By applying the LP-to-BP transformation followed by the bilinear transfor-
mation, an eighth-order transfer function is obtained as
2
z 4 + a3j z 3 + a2j z 3 + a1j z 3 + a0j
HD (z) = H0
z 4 + b3j z 3 + b2j z 3 + b1j z 3 + b0j
j=1
6.3 Summary 149
Figure 6.2 Magnitude response of the 8th-order IIR filter.
where H0 = 8.272767 ×10−3 and

⎡ ⎤ ⎡ ⎤
a01 a02 1 1
⎢a11 a12 ⎥ ⎢−1.051264 −1.348091⎥
⎢ ⎥ ⎢ ⎥
⎣a21 a22 ⎦ = ⎣ 1.278797 2.204574 ⎦
a31 a32 −1.051264 −1.348091
⎡ ⎤ ⎡ ⎤
b01 b02 0.797075 0.933566
⎢b11 b12 ⎥ ⎢−1.219252 −1.344800⎥
⎢ ⎥ ⎢ ⎥
⎣b21 b22 ⎦ = ⎣ 2.235302 2.335923 ⎦
b31 b32 −1.366509 −1.393009
The magnitude response of the digital IIR filter is depicted in Figure 6.2.
6.3 Summary
This chapter has shown that the design of IIR filters can be achieved using
analog filter theory in conjunction with bilinear transformation. Several
methods for the design of analog filters have been reviewed. The bilinear
transform is then introduced and utilized as the key technical component

for the design of stable IIR digital filters. A design procedure for the design
of lowpass, highpass, bandpass, and bandstop IIR digital filters using this
technique is presented. The design procedure is illustrated through an example
of an eighth-order digital elliptic bandpass filter.
References
[1] A. Antoniou, Digital Signal Processing: Signals, Systems, and Filters,
McGraw-Hill, New York, 2006.
[2] T. W. Parks and C. S. Burrus, Digital Filter Design, John Wiley,
New York, 1987.
[3] L. B. Jackson, Digital Filters and Signal Processing, 3rd ed., Kluwer
Academic, Boston, 1996.
7
Design Methods in the Frequency Domain
7.1 Preview
This chapter presents an alternative approach to the design of IIR digital
filters by applying optimization methods [2] where an objective function is
formulated in the frequency domain based on an error between certain desired
and actual magnitude and/or phase responses. Specifically, we present four
methods for the design of stable IIR filters, which are based on mean squared
error minimization, equal-ripple minimization of squared magnitude error,
weighted least squares subject to stability, and minimization of maximum
error subject to stability, respectively. In addition, two Remez exchange type
of techniques for designing an all-pass digital filter to approximate a desired
phase response are also examined.
7.2 Design Methods in the Frequency Domain

Many methods for the design of IIR digital filters in the frequency domain
have been developed since early 1970s. Below we describe several of them
with representative design ideas and techniques.
7.2.1 Minimum Mean Squared Error Design

The method presented below is a slightly modified algorithm proposed by
Steiglitz [1]. Let the IIR filter be expressed in terms of K second-order
cascaded sections as
K
1 + ak z −1 + bk z −2
H(z, x) = A (7.1)
1 + ck z −1 + dk z −2
k=1
where
T
x= a1 b1 c1 d1 · · · aK bK cK dK A
151
152 Design Methods in the Frequency Domain
is a vector of 4K + 1 design variables. Let Hd (ω) be the desired frequency

response and ΩL = {ωi | i = 1, 2, · · · , L} be a discrete
setof frequencies at
which the error of the actual magnitude response H(ejω , x) approximating a
desired magnitude response |Hd (ω)| is evaluated in a mean squared manner:
L
2

J2 (x) = H(ejωi , x) − |Hd (ωi )| (7.2)
i=1
The design problem is formulated as the problem of finding a vector x∗ that

minimizes the mean-squared type of objective function J2 (x) in (7.2). Many
optimization techniques suitable for the problem at hand exist [2]. These
techniques require the gradient of the objective function. By (7.2), the gradient
of J2 (x) is given by
L

∇J2 (x) = 2 ∇ H(ejωi , x) H(ejωi , x) − |Hd (ωi )| (7.3)
i=1
with ⎡ ⎤
∂ H(ejωi , x) ∂a1
⎢ ⎥
⎢ ⎥
⎢
⎢ ∂ H(ejωi , x) ∂b1 ⎥
⎥
⎢
⎢ ⎥
⎥
⎢ ∂ H(ejωi , x) ∂c1 ⎥
jωi
∇ H(e , x) = ⎢
⎢ ⎥
⎥
⎢
⎢ ∂ H(ejωi , x) ∂d1 ⎥
⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎣ ⎦
∂ H(ejωi , x) ∂A
where

∂ H(ejωi , x)
= H(ejωi , x)
∂ak
(1 + ak cos ωi + bk cos 2ωi ) cos ωi + (ak sin ωi + bk sin 2ωi ) sin ωi
(1 + ak cos ωi + bk cos 2ωi )2 + (ak sin ωi + bk sin 2ωi )2

∂ H(ejωi , x)
= H(ejωi , x)
∂bk
(1 + ak cos ωi + bk cos 2ωi ) cos 2ωi + (ak sin ωi + bk sin 2ωi ) sin 2ωi
(1 + ak cos ωi + bk cos 2ωi )2 + (ak sin ωi + bk sin 2ωi )2
7.2 Design Methods in the Frequency Domain 153

∂ H(ejωi , x)
= − H(ejωi , x)
∂ck
(1 + ck cos ωi + dk cos 2ωi ) cos ωi + (ck sin ωi + dk sin 2ωi ) sin ωi
(1 + ck cos ωi + dk cos 2ωi )2 + (ck sin ωi + dk sin 2ωi )2

∂ H(ejωi , x)
= − H(ejωi , x)
∂dk
(1 + ck cos ωi + dk cos 2ωi ) cos 2ωi + (ck sin ωi + dk sin 2ωi ) sin 2ωi
(1 + ck cos ωi + dk cos 2ωi )2 + (ck sin ωi + dk sin 2ωi )2

∂ H(ejωi , x) H(ejωi , x)
=
∂A A
A quasi-Newton algorithm [2] can be applied to the design problem where its
kth iteration updates the design variable x k to x k+1 as
x k+1 = x k + αk d k
where
d k = −S k ∇J2 (x k ), αk = arg min J2 (x k + αd k )
α

γ S γ
T T
δk δk δ γ T S +S γ δ T
S k+1 = S k + 1 + k T k k − k k kT k k k , S 0 = I 4K+1
γ k δk γ k δk
T γ k δk
δ k = x k+1 − x k , γ k = ∇J2 (x k+1 ) − ∇J2 (x k )
The step size αk is calculated by minimizing the single-variable function
J2 (x k + αd k ). The inexact line search technique initiated by Fletcher is often
found effective to perform this step [2]. The quasi-Newton algorithm starts
with an initial point x 0 that is associated with a stable but trivial IIR filter,
and the iterations continue until |J2 (x k+1 ) – J2 (x k )| is less than a prescribed
tolerance ε.
Since the above optimization is carried out without constraints on filter
stability, it is possible that at convergence some poles are outside the unit
circle. Suppose the ith second-order section of the IIR filter, i.e.
1 + ai z −1 + bi z −2
(7.4)
1 + ci z −1 + di z −2
is found to be unstable. Denoting its two poles by pi1 and pi2 , we can write
the denominator as
1 + ci z −1 + di z −2 = 1 − pi1 z −1 1 − pi2 z −1
If only one of the poles, say pi1 , is an unstable pole, then replacing the filter
section in (7.4) with
1 + ai z −1 + bi z −2
−pi1 1 − p−1
i1 z
−1 (1 − p z −1 )
i2
stabilizes the filter section without changing its magnitude response. Similarly,
if both poles are unstable, then replacing the section in (7.4) with
1 + ai z −1 + bi z −2
pi1 pi2 1 − p−1
i1 z
−1 1 − p−1i2 z
−1
stabilizes the section.
As an example, the above algorithm is applied to the design of a sixth-

order lowpass IIR digital filter with normalized passband edge ωp = 0.5π and
stopband edge ωa = 0.575π. The total number of frequency grid points in the
passband and stopband is set to L=70 and the convergence tolerance is set to
ε = 10−6 . The initial point is a 13-component vector x 0 corresponding to a
7-tap averaging FIR filter whose impulse response is {1/7, 1/7, · · · , 1/7}. It
took the algorithm 201 iterations to converge and the magnitude response of
the IIR obtained is depicted in Figure 7.1.
Figure 7.1 Magnitude response of the 6th-order lowpass IIR filter.

7.2.2 An Equiripple Design by Linear Programming

The design technique describe below is based on the algorithm proposed in
[3], see also [4]. Let the transfer function of the IIR filter be given by

M
ai z −i
A(z)
H(z) = = i=0 with b0 = 1 (7.5)
B(z)
N
bi z −i
i=0
The squared magnitude response of the filter can be expressed as
A(z)A(z −1 )
|H(z)|2 = H(z)H(z −1 ) =
B(z)B(z −1 )
M M

M
ai z −i ai z i ci z −i
i=−M
= i=0 i=0
N = N

N
bi z −i bi z i di z −i
i=0 i=0 i=−N
with ci = c−i and di = d−i . Hence the magnitude-squared function of the

filter is given by

M
c0 + 2 ci cos(iω)

H(ejω )2 = C(ω) = i=1
(7.6)
D(ω) N
d0 + 2 di cos(iω)
i=1
Let F (ω) be the desired magnitude-squared function and ε(ω) be a tolerance

function on the approximation error, i.e.,

C(ω)

D(ω) − F (ω) ≤ ε(ω)
It follows that
C(ω) − [F (ω) − ε(ω)] D(ω) ≥ 0
(7.7)
−C(ω) + [F (ω) + ε(ω)] D(ω) ≥ 0
In addition, (7.6) implies that
C(ω) ≥ 0
(7.8)
D(ω) ≥ 0
which are held in the entire baseband. By adding an auxiliary variable η to the
right-hand side of each constraint in (7.7) and (7.8), one seeks for a solution
of the optimization problem
minimize η
subject to: C(ω) − [F (ω) − ε(ω)] D(ω) + η ≥ 0
−C(ω) + [F (ω) + ε(ω)] D(ω) + η ≥ 0 (7.9)
C(ω) + η ≥ 0
D(ω) + η ≥ 0
where F (ω) and ε(ω) are given as design specifications with frequency ω vary-
ing over a dense but finite set of grids in the frequency bands of interest, and
the unknowns are coefficients {ci | i = 0, 1, · · · , M }, {di | i = 0, 1, · · · , N },
and η. Since the objective function and all constraints are linearly dependent
on the design parameters, (7.9) is a linear programming (LP) problem which
can be solved efficiently [2].
Concerning the solution of problem (7.9) for given F (ω) and ε(ω), there
are three possibilities:
(i) Problem (7.9) has a solution with η = 0. This means there exists a
solution satisfying all constraints in (7.7) and (7.8), and hence a stable
IIR filter H(z) satisfying the design specifications can be obtained by
spectrum factorization [5] of the optimized C(ω)/D(ω).
(ii) Problem (7.9) has a solution with η < 0. This means that there exist IIR
filters with design specifications more restrictive than the current error
tolerance. Such designs can be obtained by imposing a more demanding
ε(ω) such that the solution of (7.9) yields η = 0. Once this is achieved, a
stable IIR filter H(z) satisfying the design specifications can be obtained
by spectrum factorization of the optimized C(ω)/D(ω).
(iii) Problem (7.9) has a solution with η > 0. This means that IIR filters with
current specifications do not exist. In order to produce a design, a less
demanding ε(ω) should be used so that the solution of (7.9) yields η = 0.
Once this is achieved, a stable IIR filter is obtained.
A transfer function H(z) satisfying the design specifications can be obtained
by spectrum factorization of the optimized C(ω)/D(ω).
As an example, the algorithm was applied to design a fourth-order IIR
lowpass filter with normalized passband edge ωp = 0.4π and stopband
edge ωa = 0.5π. Suppose it is required that the magnitude responses of
the IIR filter in the passband and stopband vary in the range [1 − δ, 1 + δ]
and [0, δ] respectively, then the passband and stopband squared magnitude
responses, C(ω)/D(ω), will vary in the range [1 + δ 2 − 2δ, 1 + δ 2 + 2δ] and
[0, δ 2 ] respectively. Hence the desired magnitude-squared function F (ω) and
tolerance function ε(ω) are given by

1 + δ 2 for ω ∈ [0, ωp ]
F (ω) =
δ 2 2 for ω ∈ [ωa , π]
and
2δ for ω ∈ [0, ωp ]
ε(ω) =
δ2 2 for ω ∈ [ωa , π]
In the algorithm implementation, a total of 150 frequency grids uniformly
placed in the passband and stopband were used for the first two constraints
in (7.9), while 120 frequency grid uniformly placed on [0, π] were used for
the last two constraints in (7.9). By trial and error, it was found that with
δ = 0.02645, the LP problem in (7.9) has a solution with η = −7.38 × 10−5
which is practically zero. The largest magnitude of the poles of the IIR filter
obtained from the LP solution was found to be 0.8961, and the magnitude
response of the filter is depicted in Figure 7.2.
7.2.3 Weighted Least-Squares Design with Stability Constraints

Stable IIR digital filters that optimally approximate arbitrary magnitude as
well as phase responses in weighted least-squares sense can be designed using
convex quadratic programming (QP). The method described below is similar
in spirit to that reported in [6].
Let the IIR filter be expressed in terms of K second-order cascaded
sections as
K
1 + ak z −1 + bk z −2
H(z, x) = A (7.10)
1 + ck z −1 + dk z −2
k=1
where
T
x= a1 b1 c1 d1 · · · aK bK cK dK A
is a vector of 4K + 1 design variables. Let Hd (ω) be the desired frequency

response and Ω be frequency region of interest at which the error of the actual
Figure 7.2 Magnitude response of the 4th-order lowpass IIR filter.
frequency response H(ejω , x) approximating a desired response Hd (ω) is

evaluated in a weighted least-squares manner:

2
E2 (x) = W (ω) H(ejω , x) − Hd (ω) dω (7.11)
Ω
where W (ω) ≥ 0 is a weighting function defined over Ω.

The design is accomplished in an iterative manner. Suppose one has a
reasonable initial point x 0 to start, in the kth iteration one can write
H(ejω , x k + δ) H(ejω , x k ) + gkT (ω)δ (7.12)
provided that δ is small, where g k (ω) is the gradient of H(ejω , x) at x k .

The optimal updating vector δ k is obtained by minimizing

2
E2 (x k + δ) = W (ω) H(ejω , x k + δ) − Hd (ω) dω
Ω

2
W (ω) H(ejω , x k ) + g Tk (ω)δ − Hd (ω) dω
Ω
= δ Qk δ + 2δ T qk + κ
T
where

Qk = W (ω)g k (ω)g H
k (ω) dω
Ω

jω

qk = Re W (ω) H̄(e , xk ) − H̄d (ω) g k (ω) dω
Ω

2
κ= W (ω) H(ejω , x k ) − Hd (ω) dω
Ω
subject to two constraints: the filter is stable and δ is small in magnitude.
In order to implement the iteration, we use (7.10) to evaluate the gradient
g k (ω) as ⎡ ⎤
∂H(ejω , x k ) ∂a1
⎢ ⎥
⎢ ⎥
⎢ ∂H(ejω , x k ) ∂b1 ⎥
⎢ ⎥
⎢ ⎥
⎢ ∂H(ejω , x ) ∂c ⎥
⎢ k 1 ⎥
g k (ω) = ⎢
⎢ ⎥
⎥ (7.13a)
⎢ ∂H(ejω , x k ) ∂d1 ⎥
⎢ ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎣ ⎦
∂H(ejω , x k ) ∂A
with
∂H(ejω , x k ) e−jω H(ejω , x k )
=
∂ai 1 + ai e−jω + bi e−j2ω
∂H(ejω , x k ) e−j2ω H(ejω , x k )
=
∂bi 1 + ai e−jω + bi e−j2ω
∂H(ejω , x k ) e−jω H(ejω , x k )
=− (7.13b)
∂ci 1 + ci e−jω + di e−j2ω
∂H(ejω , x k ) e−j2ω H(ejω , x k )
=−
∂di 1 + ci e−jω + di e−j2ω
∂H(ejω , x k ) H(ejω , x k )
=
∂A A
for i = 1, 2, · · · , K. The desired frequency response is typically specified in
terms of desired magnitude response M (ω) and phase response θ(ω), hence
one can write
Hd (ω) = M (ω)ejθ(ω) = M (ω) cos θ(ω) + jM (ω) sin θ(ω) (7.14)
To ensure the filter’s stability, the denominator parameters of each 2nd -order
section must lie inside the stability triangle (see Section 3.2.6 of Chapter 3),
namely,
di < 1, ci + di > −1, ci − di < 1
These constraints can be expressed as Eui + e > 0 with
⎡ ⎤ ⎡ ⎤
1 1 1
⎢ ⎥ ci ⎢ ⎥
E = ⎣ −1 1 ⎦, ui = , e=⎣ 1 ⎦
di
0 −1 1
To prevent the poles from being too close to the boundary of the stability
region, one may require that Eui + (1 − τ )e ≥ 0 with a small τ > 0. Let
I i be a selection matrix such that ui = I i x, then the stability constraints are
expressed as E i x + (1 − τ )e ≥ 0 where E i = EI i for i = 1, 2, · · · , K. The
kth iteration of the design is now performed by solving the convex quadratic
programming (QP) problem
minimize δ T Qk δ + 2δ T qk + κ
subject to: E i (x k + δ) + (1 − τ )e ≥ 0 for i = 1, 2, · · · , K (7.15)
|(δ)j | ≤ β for j = 1, 2, · · · , 4K + 1
where β > 0 is a small bound to ensure δ is small. Having obtained a

solution δ k of (7.15), the design vector is updated to x k+1 = x k + δ k , and the
iteration continues until x k+1 − x k is less than a prescribed convergence
tolerance ε.
As an example, the algorithm was applied to design a fourteenth-order IIR
lowpass filter with normalized passband edge ωp = 0.4π and stopband edge
ωa = 0.45π. The desired frequency response includes a linear phase response
in the passband with group delay being 15.5. In design implementation, a
total of 90 uniformly placed frequency grids were used to evaluate Qk and qk
in (7.15), the weighting function, stability margin, and bound for increment
of design vector were set to W (ω) ≡ 1, τ = 0.12, and β = 0.03,
respectively. With tolerance ε = 0.03, it took the algorithm 62 iterations
to converge and the magnitude and passband phase response of the IIR
filter obtained are depicted in Figure 7.3(a) and (b), respectively. The largest
magnitude of the poles was found to be 0.9381 that ensures the filter’s
stability.
Figure 7.3 The frequency characteristics of the IIR filter. (a) Magnitude response (left side).
(b) Passband phase response (right side).
7.2.4 Minimax Design with Stability Constraints

Convex programming techniques are also useful in designing stable IIR
filters that optimally approximate arbitrary magnitude and phase responses
in minimax sense in that maximum approximation error in the frequency
domain is minimized. The design method described below is based on the
work reported in [6].
Let the IIR filter be expressed in terms of K second-order cascaded
sections as given by (7.10), Hd (ω) be the desired frequency response, and
Ω be frequency region of interest at which the error of the actual frequency
response H(ejω , x) approximating a desired response Hd (ω) is evaluated in
a the following manner:

E∞ (x) = maximize W (ω) H(ejω , x) − Hd (ω) (7.16)
ω∈Ω
where W (ω) ≥ 0 is a weighting function defined over Ω. The design is

accomplished in an iterative manner. Suppose one has a reasonable initial
point x 0 to start, in the kth iteration one can write
H(ejω , x k + δ) H(ejω , x k ) + g Tk (ω)δ (7.17)
provided that δ is small, where g k (ω) is the gradient of H(ejω , x) at x k and
can be evaluated using (7.13a) and (7.13b). The optimal updating vector δk is
obtained by minimizing

E∞ (x + δ) = maximize W (ω) H(ejω , x + δ) − Hd (ω)
ω∈Ω

maximize W (ω) H(ejω , x k ) + g Tk (ω)δ − Hd (ω)
ω∈Ω
(7.18)

By introducing an upper bound η of W (ω) H(e , x k ) + g k (ω)δ − Hd (ω)
jω T
for ω ∈ Ω, minimizing the approximate E∞ (x + δ) in (7.18) can be

expressed as
minimize η

subject to: W (ω) g Tk (ω)δ + Dk (ω) ≤ η ω∈Ω
where Dk (ω) = H(ejω , x k ) − Hd (ω) in a known quantity in the kth iteration.

To complete the design formulation, additional constraints concerning the
stability of the filter and smallness of the increment vector δ are also imposed
(see (7.15)) and the kth iteration of the algorithm is carried out by solving the
optimization problem
minimize η

subject to: W (ω) g Tk (ω)δ + Dk (ω) ≤ η ω∈Ω
(7.19)
E i (x k + δ) + (1 − τ )e ≥ 0 for i = 1, 2, · · · , K
|δ j | ≤ β for j = 1, 2, · · · , 4K + 1
where scalar bound η and increment vector δ are variables, the last two sets of
constraints are linear, while the first set of constraints are of second-order cone
type [2]. To see this, only a set of dense discrete frequencies Ωd = {ωi | i =
1, 2, · · · , L} ⊂ Ω is considered, and for each ωi ∈ Ωd the first constraint in
(7.19) is written as

W (ωi )g T (ωi ) W (ωi )Drk (ωi )
rk
δ+ ≤ η for i = 1, 2, · · · , L
W (ωi )g Tjk (ωi ) W (ωi )Djk (ωi )
(7.20)
where g rk (ωi ) and g jk (ωi ) are the real and imaginary parts of g k (ωi ),
respectively, and Drk (ωi ) and Djk (ωi ) are the real and imaginary parts
of Dk (ω), respectively. Replacing the first set of constraints in (7.19) with
those in (7.20), one obtains a standard second-order cone programming
(SOCP)problem as
minimize η

W (ω )g T (ω ) W (ωi )Drk (ωi )
i rk i
subject to: T δ + ≤η
W (ωi )g jk (ωi ) W (ωi )Djk (ωi )
(7.21)
for i = 1, 2, · · · , L
E i (x k + δ) + (1 − τ )e ≥ 0 for i = 1, 2, · · · , K
|(δ)j | ≤ β for j = 1, 2, · · · , 4K + 1
which can be solved efficiently [2]. Having obtained a solution δk of (7.21),

the design vector is updated to x k+1 = x k + δ k , and the iteration continues
until x k+1 − x k is less than a prescribed convergence tolerance ε.
As an example, the algorithm was applied to design a twelfth-order IIR
lowpass filter with normalized passband edge ωp = 0.5π and stopband edge
ωa = 0.6π. The desired frequency response includes a linear phase response
in the passband with group delay being 12. In design implementation, a total of
L = 80 uniformly placed frequency grids were used in (7.21), the weighting
function, stability margin, and bound for increment of design vector were
set to W (ω) = 1, τ = 0.12, and β = 0.03, respectively. With tolerance
ε = 0.05, it took the algorithm 49 iterations to converge and the magnitude
and passband phase response of the IIR filter obtained are depicted in
Figure 7.4(a) and (b), respectively. The largest magnitude of the poles was
found to be 0.9381 that ensures the filter’s stability.
Figure 7.4 The frequency characteristics of the IIR filter. (a) Magnitude response (left side).
(b) Passband phase response (right side).
7.3 Design of All-Pass Digital Filters

7.3.1 Design of All-Pass Filters Based on Frequency Response
Error
The transfer function of an nth-order all-pass digital filter is described by
an + an−1 z −1 + · · · + a1 z −(n−1) + z −n
H(z) =
1 + a1 z −1 + · · · + an−1 z −(n−1) + an z −n
n
ak z k (7.22)
= z −n k=0
n , a0 = 1

ak z −k
k=0
whose frequency response is given by

n

ak ejkω
k=0
H(ejω ) = e−jnω n = ejθ(ω) (7.23)

−jkω
ak e
k=0
where ⎛ ⎞
n

⎜ ak sin kω ⎟
⎜ k=0 ⎟
θ(ω) = −nω + 2 tan−1 ⎜
⎜ n
⎟
⎟
⎝ ⎠
ak cos kω
k=0
Let d(ω) be a desired phase characteristic. We now define a complex error

function, called frequency response error, as
1
E(ejω ) = H(ejω ) − Hd (ejω )
2
(7.24)
1 jθ(ω)
= e − ejd(ω)
2
where Hd (ejω ) = ejd(ω) . By substituting (7.23) into (7.24), we obtain
7.3 Design of All-Pass Digital Filters 165
n

ak ejkω
1 −jnω k=0
E(ejω ) = e n − ejd(ω)
2
ak e−jkω
k=0
n
% nω+d(ω) nω+d(ω)
&
ak ej(kω− 2 ) − e−j(kω− 2 )
1 −j nω−d(ω) k=0
= e 2 · n
2
ak e−jkω
k=0
n
% nω + d(ω) &
ak sin kω −
nω−d(ω) 2
k=0
= je−j 2 · n

ak e−jkω
k=0
(7.25)
From (7.25), the amplitude Ê(ejω ) of the frequency response error in (7.24)
is found to be
n
% nω + d(ω) &
ak sin kω −
2
Ê(ejω ) = k=0 n (7.26)

ak e−jkω
k=0
The Remez algorithm (see Section 5.7) can be applied to the numerator
in (7.26) to minimize the amplitude error in (7.26) iteratively subject to
some linearized constraints as detailed below. We first choose n + 1 initial
values of extremal frequencies ωi for i = 0, 1, · · · , n over a frequency
domain of interest appropriately. Then we design an all-pass digital filter
so that Ê(ejω ) alternates in equiripple with respect to value 0 at n + 1
extremal frequencies over a frequency domain of interest. Hence, (7.26) is
written as
Ê(ejωi ) = (−1)i δ (7.27)
where δ an initial amplitude error. By substituting (7.26) into (7.27) and
multiplying the both sides by |A(ejωi )|, we obtain
n
% nωi + d(ωi ) &
ak sin kωi − = (−1)i δ |A(ejωi )| for i = 0, 1, · · · , n
2
k=0
(7.28)
where
n

|A(ejω )| = ak e−jkω
k=0
'
(% &2 % &2
( n n
=) ak cos kω + ak sin kω
k=0 k=0
Equation (7.28) can be expressed in matrix form as

⎡ ⎤⎡ ⎤ ⎡ ⎤
φ(0, 0) φ(0, 1) · · · φ(0, n) a0 δ |A(ejω0 )|
⎢ φ(1, 0) φ(1, 1) · · · φ(1, n) ⎥ ⎢ a1 ⎥ ⎢ (−1)δ |A(ejω1 )| ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ .. .. .. .. ⎥ ⎢ .. ⎥ = ⎢ .. ⎥ (7.29)
⎣ . . . . ⎦⎣ . ⎦ ⎣ . ⎦
φ(n, 0) φ(n, 1) · · · φ(n, n) an n jω
(−1) δ |A(e )|
n
where
% nωi + d(ωi ) &
φ(i, k) = sin kωi −
2
The above simultaneous equations can be solved for the filter coefficients
{ai | i = 0, 1, · · · , n} provided that the locations of the n + 1 extremal
frequencies are known a priori. However, the resulting Ê(ejω ) is not always
equiripple over a frequency domain of interest. To make Ê(ejω ) equiripple,
the n + 1 locations ωi for i = 0, 1, · · · , n are adjusted iteratively over a
frequency domain of interest by employing the Remez exchange algorithm
which is a very efficient iterative procedure for determining the locations of
the extremal frequencies. Step-by-step algorithmic details are given below,
where new extremal frequencies of Ê(ejω ), say ωi , replace previous extremal
frequencies {ωi | i = 0, 1, · · · , n}, and the simultaneous equations in (7.29)
are solved again, and this process continues until
max {|ωi − ωi |} < ε (7.30)

0≤i≤n
is satisfied where ε is a prescribed tolerance.

The Remez Exchange Algorithm

Step 1: Set the order n of an all-pass filter and desired phase characteristic
d(ω).
Step 2: Select the initial values of extremal frequencies {ωi | i = 0, 1, · · · , n}
which are equally spaced over a frequency domain of interest.
Step 3: Set |A(ejωi )| = 1 for i = 0, 1, · · · , n.
Step 4: Compute the coefficients of an all-pass filter {ak | k = 0, 1, · · · , n}
by solving (7.29), then multiply the coefficients found by a scaling
constant to normalize the coefficient a0 to unity.
Step 5: Find the new extremal frequencies {ωi | i = 0, 1, · · · , n} from the
values of |E(ejω )| evaluated at the dense set of frequencies.
Step 6: If (7.30) is satisfied, go to Step 8. Otherwise, replace ωi by ωi for
i = 0, 1, · · · , n and set
n
1
δ= |Ê(ejωi )|
n+1
i=0
Step 7: Compute |A(ejωi )| for i = 0, 1, · · · , n with the coefficients {ak | k =

0, 1, · · · , n} obtained at the previous pass, and go back to Step 4.
Step 8: Obtain the transfer function H(z) in (7.22).
7.3.2 Design of All-Pass Filters Based on Phase Characteristic

Error
Let the phase characteristic error be defined by
e(ω) = θ(ω) − d(ω) (7.31)

where θ(ω) and d(ω) denote the actual and desired phase characteristics,
respectively. It follows from (7.31) and (7.23) that
eje(ω) = ej[θ(ω)−d(ω)] = H(ejω )e−jd(ω)

n
n

ak ejkω ak ejϕk (ω)
k=0 k=0
(7.32)
= e−j[nω+d(ω)] · n = n

ak e−jkω ak e−jϕk (ω)
k=0 k=0
where
nω + d(ω)
ϕk (ω) = kω −
2
From (7.32), the phase characteristic error can be expressed as
⎛ n ⎞

⎜ ak sin ϕk (ω) ⎟
⎜
−1 ⎜ k=0
⎟
e(ω) = 2 tan ⎜ n ⎟ (7.33)
⎟
⎝ ⎠
ak cos ϕk (ω)
k=0
We now apply a Remez algorithm to (7.33) so that the sign of e(ω) alternates
at n + 1 extremal frequencies. We first select n + 1 extremal frequencies
{ωi | i = 0, 1, · · · , n} over a frequency domain of interest appropriately and
define
e(ωi ) = (−1)i δ for i = 0, 1, · · · , n (7.34)
where δ stands for the phase characteristic error. By substituting (7.33) into
(7.34), we arrive at
n
n
δ
ak sin ϕk (ωi ) = tan ak (−1)i cos ϕk (ωi )
2 (7.35)
k=0 k=0
i = 0, 1, · · · , n
which can be expressed in matrix form as

δ
P a = tan Qa (7.36)
2
where
⎡ ⎤ ⎡ ⎤
sin ϕ0 (ω0 ) sin ϕ1 (ω0 ) · · · sin ϕn (ω0 ) 1
⎢ sin ϕ0 (ω1 ) sin ϕ1 (ω1 ) · · · sin ϕn (ω1 ) ⎥ ⎢ a1 ⎥
⎢ ⎥ ⎢ ⎥
P =⎢ .. .. . .. ⎥, a = ⎢ .. ⎥
⎣ . . . . . ⎦ ⎣.⎦
sin ϕ0 (ωn ) sin ϕ1 (ωn ) · · · sin ϕn (ωn ) an
⎡ ⎤
cos ϕ0 (ω0 ) cos ϕ1 (ω0 ) ··· cos ϕn (ω0 )
⎢ (−1) cos ϕ0 (ω1 ) (−1) cos ϕ1 (ω1 ) · · · (−1) cos ϕn (ω1 ) ⎥
⎢ ⎥
Q=⎢ .. .. . .. ⎥
⎣ . . . . . ⎦
(−1)n cos ϕ0 (ωn ) (−1)n cos ϕ1 (ωn ) · · · (−1)n cos ϕn (ωn )
Equation (7.36) can be written as
P −1 Qa = λa (7.37)
where
% δ &−1
λ = tan
2
Suppose the matrix P −1 Q has the eigenvalues λ0 , λ1 , · · · , λn and corre-
sponding eigenvectors a0 , a1 , · · · , an where |λ0 | > |λ1 | > · · · > |λn |, then
the eigenvector a0 corresponding to the eigenvalue λ0 is theoretically the
optimal solution, because the phase error δ is related to λ by
1
δ = 2 tan−1 (7.38)
λ
However, it was noted in [9] that the all-pass filter with the coefficient vector
a0 is not always stable. Moreover, it often occurs in this case that the Remez
iterative algorithm either does not converge or converges to an unstable filter.
Under these circumstances, an improved algorithm which takes filter stability
into account is proposed in [9]. The algorithm is outlined below.
Step 1: Set the order n of an all-pass filter and desired phase characteristic
d(ω).
Step 2: Select the initial values of extremal frequencies {ωi | i = 0, 1, · · · , n}
which are equally spaced over a frequency domain of interest.
Step 3: Compute the eigenvalues λ0 , λ1 , · · · , λn and corresponding eigen-
vectors a0 , a1 , · · · , an of P −1 Q where |λ0 | > |λ1 | > · · · >
|λn |.
Step 4: Set l = 0.
Step 5: Check whether the filter with al corresponding to λl is stable or not. If
it is stable and the sign of e(ωi ) alternates at n+1 extremal frequencies,
go to the next step. Otherwise, add l to 1 and go back to the top of
this step.
Step 6: Find the new extremal frequencies {ωi | i = 0, 1, · · · , n} from the
values of e(ω) evaluated at the dense set of frequencies.
Step 7: If (7.30) is satisfied, go to the next step. Otherwise, replace ωi by ωi
for i = 0, 1, · · · , n and go back to Step 3.
Step 8: Obtain the transfer function H(z) in (7.22).
7.3.3 A Numerical Example

As an example, the two algorithms studied above were applied to design
an 8th-order all-pass digital filter where the desired phase characteristic is
specified by

− 7ω for |ω| ≤ 0.5π
d(ω) =
− 7ω − π for |ω| ≥ 0.6π
The extremal frequencies were searched over discrete frequency points ωi =

πi/1000 for i = 0, 1, · · · , 1000. Nine initial values of extremal frequencies
were chosen as 0.05π, 0.15π, 0.25π, 0.35π,
* 0.45π, 0.65π, 0.75π, 0.85π, 0.95π
over a frequency domain of interest Ω1 Ω2 where Ω1 = {ω| 0 < ω ≤ 0.5π}
and Ω2 = {ω| 0.6 ≤ ω < π}.
By applying the design method using frequency response error with initial
amplitude of the frequency response error E(ejω ) being δ = 0.1, it took the
Remez exchange algorithm 5 iterations to converge to
⎡ ⎤ ⎡ ⎤
a0 a1 a2 1.00000000 0.15520071 0.48184349
⎣a3 a4 a5 ⎦ = ⎣ −0.07326981 −0.09658253 0.03963666 ⎦
a6 a7 a8 0.03849011 −0.06678163 −0.02050755
The maximum phase error and maximum magnitude of the frequency res-
ponse error were found to be
7.93757266 × 10−2 and 3.96774452 × 10−2
respectively. The resulting phase and phase error characteristics are depicted
in Figure 7.5, and the amplitude characteristic of the frequency response error
is shown in Figure 7.6.
By employing the design method using phase characteristic error, it took
the improved algorithm 5 iterations to converge to
⎡ ⎤ ⎡ ⎤
a0 a1 a2 1.00000000 0.15519789 0.48184637
⎣a3 a4 a5 ⎦ = ⎣ −0.07326919 −0.09658404 0.03963450 ⎦
a6 a7 a8 0.03849024 −0.06677818 −0.02050865
where all ai ’s were scaled to normalize a0 to unity. The maximum phase error
was found to be 7.93644811 × 10−2 .
The resulting phase and phase error characteristics are depicted in
Figure 7.7.
7.4 Summary 171
Figure 7.5 All-pass filter designed by using frequency response error. (a) Phase characteristic.
(b) Phase error characteristic.
Figure 7.6 Amplitude characteristic of the frequency response error.
Figure 7.7 All-pass filter designed with phase characteristic error. (a) Phase characteristic.
(b) Phase error characteristic.
7.4 Summary
This chapter has shown that the design of stable IIR filters can be achieved by
applying optimization methods. In particular, an unconstrained optimization
method based on a quasi-Newton algorithm has been applied to minimize mean
squared error; linear programming has been applied to minimize a squared
magnitude error for an equal-ripple design; convex quadratic programming

has been used in weighted least squares design of stable IIR filters; and second-
order cone programming has been used in minimax design of stable IIR filters.
Based on the notion of frequency-response error and phase characteristic error,
respectively, two Remez exchange type of algorithms for the design of all-pass
digital filters have been presented. Design examples have been presented to
illustrate these methods.
References
[1] K. Steiglitz, “Computer-aided design of recursive digital filters,” IEEE
Trans. Audio and Electroacoust., vol. AU-18, no. 2, pp. 123–129, June
1970.
[2] A. Antoniou and W.-S. Lu, Practical Optimization: Algorithms and
Engineering Applications, Springer, New York, 2007.
[3] L. R. Rabiner, N. Y. Graham, and H. D. Helms, “Linear program-
ming design of IIR digital filters with arbitrary magnitude function,”
IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-22, no. 2,
pp. 117–123, Apr. 1974.
[4] L. R. Rabiner and R.-B. Gold, Theory and Application of Digital Signal
Processing, Prentice-Hall, Englewood Cliffs, NJ., 1975.
Wesley, Reading, MA, 1987.
[6] W.-S. Lu and T. Hinamoto, “Optimal design of IIR digital filters
with robust stability using conic-quadratic-programming updates,” IEEE
Trans. Signal Process., vol. 51, no. 6, pp. 1581–1592, June 2003.
[7] M. Ikehara, M. Funaishi and H. Kuroda, “Design of digital all-pass
networks using Remez algorithm,” IEICE Trans. Fundamentals of Elec-
tronics, Communications and Computer Sciences, vol. J74-A, no. 7,
pp. 974–979, July 1991. (in Japanese)
[8] M. Ikehara, M. Funaishi and H. Kuroda, “Design of complex all-pass
networks using Remez algorithm,” IEEE Trans. Circuits Syst. II, vol. 39,
no. 8, pp. 549–556, Aug. 1992.
[9] Y. Toguri and M. Ikehara, “A design method of all-pass networks based
on the eigen filter method with consideration of the stability,” IEICE
Trans. Fundamentals, vol. E78-A, no. 7, pp. 885–889, July 1995.
[10] T. Q. Nguyen, T. I. Laakso and R. D. Koilpillai, “Eigenfilter approach
for the design of allpass filters approximating a given phase response,”
IEEE Trans. Signal Process., vol. 42, no. 9, pp. 2257–2263, Sep. 1994.
8
Design Methods in the Time Domain
8.1 Preview
The problem of designing an IIR digital filter involves determination of the
coefficients ai ’s and bi ’s of a rational transfer function of the form
N (z) b0 + b1 z −1 + · · · + bm z −m
H(z) = = (8.1)
D(z) 1 + a1 z −1 + · · · + an z −n
There are two different approaches for the design of IIR digital filters. One
approach is carried out in the frequency domain, which consists of minimizing
some measure of the difference between the frequency response of the filter
H(ejω ) and a desired frequency response F (ejω ). The other approach is
carried out in the time domain, which consists of minimizing some measure
of the difference between the impulse response of the filter

1 dz
hi = H(z)z i (8.2)
2πj C z
where C is a counterclockwise contour that encircles the origin, and a desired
impulse response fi in a direct way. Typically this direct minimization leads
to a nonlinear problem. IIR filter design problems in the time domain can
be mainly divided into two classes: least-squares approximation problem and
modified least squares problem.
A least-squares approximation problem can be stated as follows. Given
an impulse response sequence {f0 , f1 , f2 , · · · }, find an IIR digital filter of the
form in (8.1) which minimizes
∞

||f − h||2 = (fi − hi )2
i=0 (8.3)
π
1
= |F (ejω ) − H(ejω )|2 dω
2π −π
173
174 Design Methods in the Time Domain
At a glance (8.3) provides a natural choice of error measure. However, it is ill-

behaved in other respects. First, the problem of minimizing (8.3) with respect
to unknown coefficients {a1 , a2 , · · · , an , b0 , b1 , · · · , bm } in (8.1) is highly
nonquadratic, and is indeed a sophisticated nonlinear programming problem.
Second, (8.3) requires the entire impulse response sequence. The problems in
the literature have generally been specific to a particular input-output record or
have considered only a truncated version of the impulse response sequence.
A modified least squares problem consists of modifying the approximation
problem whose cost function is (8.3) by considering instead a cost function
which is quadratic in the coefficients ai ’s and bi ’s of the IIR digital filter in
(8.1). In the modified problem one seeks the coefficients which minimize the
quadratic form
π
1
J(a, b) = |F (ejω )D(ejω ) − N (ejω )|2 dω (8.4)
2π −π
where a = (a1 , a2 , · · · , an )T and b = (b0 , b1 , · · · , bm )T . The integral in

(8.4) differs from that in (8.3) by the inclusion of |D(ejω )|2 in the integrand.
The difference between (8.3) and (8.4) is illustrated in Figure 8.1.
Figure 8.1 Time-domain IIR filter design. (a) Output error for least-squares approximation
problem using (8.3). (b) Equation error for modified least-squares problem minimizing (8.4).
8.2 Design Based on Extended Pade’s Approximation 175
Another option for solving the least-squares approximation problem over

the finite interval of the actual and desired impulse responses relies on a
state-space approach where either the Hankel matrix or the controllability and
observability Grammians are utilized to approximate a given finite impulse
response sequence by a state-space model (or equivalently, an IIR digital
filter).
In this chapter, several typical techniques for designing IIR digital filters
in the time domain will be addressed concisely.
8.2 Design Based on Extended Pade’s Approximation

Suppose that the desired transfer function F (z) with a finite sequence {fi | i =
0, 1, · · · , N } of the impulse response is given by
F (z) = f0 + f1 z −1 + f2 z −2 + · · · + fN z −N (8.5)
We seek to design an IIR digital filter whose transfer function H(z) is

described by
b0 + b1 z −1 + · · · + bm z −m
H(z) =
1 + a1 z −1 + · · · + an z −n
∞
(8.6)
−i
= hi z
i=0
The problem being considered here is to find the coefficients a1 , a2 , · · · , an

and b0 , b1 , · · · , bm of the transfer function H(z) such that the actual impulse
response hi ’s approximates the desired impulse response fi ’s in a certain sense
over the finite interval 0 ≤ i ≤ N provided that m + n ≤ N .
k

bk = hk−i ai for 0 ≤ k ≤ m
i=0
(8.7)
min{k,n}

0= hk−i ai for m < k ≤ N
i=0
⎡ ⎤ ⎡ ⎤⎡ ⎤
b0 h0 0 ··· 0 1
⎢ b1 ⎥ ⎢ .. ⎥ ⎢ a ⎥
⎢ ⎥ ⎢ h1 h0 . 0 ⎥ ⎢ 1⎥
=
⎢ .. ⎥ ⎢ . .. .. ⎥ ⎢ .. ⎥
⎣ . ⎦ ⎣ .. .
..
. . ⎦⎣ . ⎦
bm hm hm−1 · · · hm−n an
⎡ ⎤ ⎡ ⎤⎡ ⎤ (8.8)
0 hm+1 hm · · · hm−n+1 1
⎢ 0 ⎥ ⎢hm+2 hm+1 · · · hm−n+2 ⎥ ⎢ a1 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ .. ⎥ = ⎢ .. .. . .. ⎥ ⎢ .. ⎥
⎣.⎦ ⎣ . . . . . ⎦⎣ . ⎦
0 hN hN −1 · · · hN −n an
where hk = 0 for k < 0 and a0 = 1.
8.2.1 A Direct Procedure

By replacing the unknown impulse response hi ’s by the desired one fi ’s in
(8.8) and defining an (N − m) × 1 error vector e1 , (8.8) is changed to
b F1 1
= (8.9)
e1 F2 a
where
T T
a = a1 a2 · · · an , b = b0 b1 · · · bm
⎡ ⎤ ⎡ ⎤
f0 0 ··· 0 fm+1 fm · · · fm−n+1
⎢ f1 f ··· 0 ⎥ ⎢fm+2 fm+1 · · · fm−n+2 ⎥
⎢ 0 ⎥ ⎢ ⎥
F 1 = ⎢ .. .. .. .. ⎥ , F 2 = ⎢ .. .. .. .. ⎥
⎣ . . . . ⎦ ⎣ . . . . ⎦
fm fm−1 · · · fm−n fN fN −1 · · · fN −n
with fk = 0 for k < 0. A quadratic measure J(a) is now defined as
J(a) = eT1 e1 = (F 3 a − f̂ )T (F 3 a − f̂ ) (8.10)
where f̂ and F 3 are (N − m) × 1 and (N − m) × n matrices defined by

⎡ ⎤ ⎡ ⎤
fm fm−1 · · · fm−n+1 fm+1
⎢fm+1 fm · · · fm−n+2 ⎥ ⎢fm+2 ⎥
⎢ ⎥ ⎢ ⎥
F 3 = ⎢ .. .. .. . ⎥ , f̂ = − ⎢ .. ⎥
⎣ . . . .. ⎦ ⎣ . ⎦
fN −1 fN −2 · · · fN −n fN
8.2 Design Based on Extended Pade’s Approximation 177
respectively. By solving the equation

∂J(a)
= 2F T3 F 3 a − f̂ = 0 (8.11)
∂a
the denominator coefficient vector a which minimizes J(a) in (8.10) can be
obtained as −1 T
a = F T3 F 3 F 3 f̂ (8.12)
provided that rank F 3 = n. Once the coefficient vector a is found, the
numerator coefficient vector b can be readily derived from the upper portion
of (8.9) as
1
b = F1 (8.13)
a
8.2.2 A Modified Procedure

Instead of an error vector e1 in (8.9), a more general (N + 1) × 1 error vector
e2 is introduced as
b F1 1
+ e2 = (8.14)
0 F2 a
Equation (8.14) can be rearranged in the form
b f0
+ e2 = A0 (8.15)
0 f
where ⎡ ⎤
a0 0 ··· ··· 0 ··· 0
⎢ .. .. .. ⎥
⎢ a1 a0 . . .⎥
⎢. .. ⎥
⎢. .. .. .. .. ⎥
⎢. . . . . .⎥
⎢ ⎥
A0 = ⎢an · · · a1 a0 0 ··· 0⎥
⎢ .. ⎥
⎢ 0 a ··· a a0
..
.⎥
⎢ n 1 . ⎥
⎢. . ⎥
⎣ .. .. ... ..
.
..
. 0⎦
0 · · · 0 an ··· a1 a0
Moreover, an N × 1 error vector between the desired impulse response fi ’s
and the actual one hi ’s over a finite interval 1 ≤ i ≤ N can be defined as
e=f −h (8.16)
where
f = [f1 f2 · · · fN ]T , h = [h1 h2 · · · hN ]T
Replacing the desired vector f by the actual vector h in (8.15) yields

b h0 f0 e0
= A0 = A0 − (8.17)
0 h f e
where e0 = f0 − h0 . Since rank A0 = N + 1, matrix A0 is nonsingular.

Hence (8.17) can be written as
f0 e0 b
− = [D 1 D 2 ] = D1 b (8.18)
f e 0
where A−1
0 = [D 1 D 2 ].
The denominator coefficient vector a is determined by (8.12). The problem
considered here is to obtain the numerator coefficient vector b that minimizes
T
e 0 f0 f 0
I(b) = e0 eT W = − D1 b W − D 1 b (8.19)
e f f
where W is an (N + 1) × (N + 1) symmetric positive-definite weighting

matrix specified by the designer. Differentiating (8.19) with respect to vector
b and setting it to null yields
∂I(b) f0
= 2 D T1 W D 1 b − D T1 W =0 (8.20)
∂b f
which leads to
f0
b = (D T1 W D 1 )−1 D T1 W (8.21)
f
This is the numerator coefficient vector b that minimizes I(b) in (8.19).
8.3 Design Using Second-Order Information

8.3.1 A Filter Design Method
Given an infinite sequence {fi | i = 0, 1, · · · } of a desired impulse response
in terms of the transfer function
F (z) = f0 + f1 z −1 + f2 z −2 + · · · + fk z −k + · · · (8.22)
8.3 Design Using Second-Order Information 179
the problem being considered here is to find the coefficients a1 , a2 , · · · , an

and b0 , b1 , · · · , bm of a transfer function of the form
N (z) b0 + b1 z −1 + · · · + bm z −m
H(z) = = (8.23)
D(z) 1 + a1 z −1 + · · · + an z −n
so as to minimize a quadratic measure defined by
π
1
ε= |F (ejω )D(ejω ) − N (ejω )|2 dω (8.24)
2π −π
Applying Parseval’s formula to (8.24) yields
m
m
n
n
n
ε= b2k − 2 bk aj fk−j + ai aj r|i−j| (8.25)
k=0 k=0 j=0 i=0 j=0
where
∞

rk = fi fk+i , a0 = 1
i=0
Here, fi and rk are called the first-order information and the second-order
information for the filter in (8.22), respectively. Differentiating (8.25) with
respect to coefficients bk ’s and setting the results to zero leads to
n

bk = aj fk−j for k = 0, 1, · · · , m (8.26)
j=0
⎡ ⎤ ⎡ ⎤⎡ ⎤
b0 1 0 ··· 0 f0
⎢ b1 ⎥ ⎢ a1 1 · · · 0⎥ ⎢ f1 ⎥
⎥ ⎢
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ = ⎢ .. .. . . .. ⎥ ⎢ . ⎥ (8.27)
⎣ . ⎦ ⎣ . . . . ⎦ ⎣ .. ⎦
bm am · · · a1 1 fm
n
n m
n 2
ε= ai aj r|i−j| − aj fk−j
i=0 j=0 k=0 j=0
⎤ ⎡
1 (8.28a)
⎢ a1 ⎥
⎢ ⎥
= [ 1 a1 · · · an ] K(m, n) ⎢ .. ⎥
⎣.⎦
an
where K(m, n) is an (n + 1) × (n + 1) symmetric matrix denoted by

⎡ ⎤
K00 (m, n) K01 (m, n) · · · K0n (m, n)
⎢ K (m, n) K (m, n) · · · K (m, n) ⎥
⎢ 10 11 1n ⎥
K(m, n) = ⎢ . . . . ⎥ (8.28b)
⎣ .
. .
. . . .
. ⎦
Kn0 (m, n) Kn1 (m, n) · · · Knn (m, n)
whose (i, j)th component is given by

m

Kij (m, n) = r|i−j| − fk−i fk−j
k=0
m (8.28c)

= r|i−j| − fk−i fk−j for i, j = 0, 1, · · · , n
k=max{i,j}
with fk = 0 for k < 0.
Lemma 8.1
Let K be a positive semidefinite symmetric matrix, and ψ be a given vector.
The solution x∗ of the problem
min xT Kx subject to xT ψ = 1 (8.29)

x
satisfies Kx∗ = αψ where α is the minimum value of problem (8.29).
Proof
We define the Lagrange function
1 T
J(x, λ) = x Kx − λ(ψ − 1) (8.30)
2
where λ is a Lagrange multiplier, and compute the gradients
⎡ ⎤
∂J(x, λ)
⎢ ∂x ⎥ Kx − λψ
⎢ ⎥
⎣ ∂J(x, λ) ⎦ = −(xT ψ − 1)
∂λ (8.31)

K −ψ x
= T
−ψ 1/λ λ
By setting (8.31) to null, we obtain

∗
K −ψ x
=0 (8.32)
−ψ T 1/λ λ∗
where x∗ and λ∗ denote the optimal values of x and λ, respectively. From

(8.32), it follows that
Kx∗ = λ∗ ψ, x∗T Kx∗ = λ∗ x∗T ψ = λ∗ = α (8.33)
which leads to
Kx∗ = αψ (8.34)
This completes the proof of Lemma 8.1.

By choosing x = [1 a1 · · · an ]T and ψ = [1 0 · · · 0]T in Lemma 8.1,
the optimal coefficients a1 , a2 , · · · , an minimizing (8.28a) must satisfy
⎡ ⎤ ⎡ ⎤
1 1
⎢a ⎥ ⎢0⎥
⎢ 1⎥ ⎢ ⎥
K(m, n) ⎢ . ⎥ = αmn ⎢ . ⎥ (8.35)
⎣ .. ⎦ ⎣ .. ⎦
an 0
where αmn is the nonnegative minimum value of (8.28a). Hence the optimal
coefficient vector a = [a1 , a2 , · · · , an ]T minimizing (8.28a) can be obtained
from second to last equations in (8.35) as
a = −K −1
o g (8.36)
where
⎡ ⎤ ⎡ ⎤
K11 (m, n) K12 (m, n) · · · K1n (m, n) K10 (m, n)
⎢ K (m, n) K (m, n) · · · K (m, n) ⎥ ⎢ K (m, n) ⎥
⎢ 21 22 2n ⎥ ⎢ 20 ⎥
Ko = ⎢ . . . . ⎥, g=⎢ . ⎥
⎣ .. .. .. .. ⎦ ⎣ .. ⎦
Kn1 (m, n) Kn2 (m, n) · · · Knn (m, n) Kn0 (m, n)
It is noted that (8.36) is the optimal solution for a given order n which
minimizes ε in (8.28a) with respect to vector a, because ε is a convex quadratic
function whose minimizer can be found by taking the gradient of ε with
respect to vector a and setting it to null. Moreover, (8.35) plays an important

role later in stability issue as well as derivation of an efficient algorithm for
solving (8.35).
Once the denominator coefficient vector a is found, the numerator
coefficients b0 , b1 , · · · , bm can be readily determined from (8.27).
8.3.2 Stability
We now address the stability of IIR digital filters designed by the method in
Section 8.3.1.
Lemma 8.2 (Lyapunov Stability Theorem)

If (A, b) is a controllable pair, i.e., the matrix G = [b, Ab, · · · , An−1 b] is
nonsingular, and there exist symmetric matrices K and L with K positive
definite and L positive semidefinite such that
K = AKAT + αbbT + L (8.37)
for some α > 0, then the eigenvalues of A all lie in the open unit disk, where
A and b are n × n and n × 1 real matrices, respectively.
Proof

K = A AKAT + αbbT + L AT + αbbT + L
= A2 K(AT )2 + αAbbT AT + ALAT + αbbT + L

= ···
= An K(AT )n + αGGT + L + ALAT + · · · + An−1 L(AT )n−1
(8.38)
Recall the standard Lyapunov stability theorem in [6], which states that the
matrix B has all eigenvalues in the open unit disk if and only if there exist two
positive definite symmetric matrices V and W for which V = B T V B +W .
By taking V = K and W = αGGT + L + ALAT + · · · +
An−1 L(AT )n−1 , and noticing that G is nonsingular, we conclude that both V
and W are positive definite, hence (8.38) satisfies the condition in the standard
Lyapunov stability theorem, and the eigenvalues of (AT )n must all lie in the
open unit disk. Evidently, these eigenvalues are simply the nth powers of the
eigenvalues of A, and thus the eigenvalues of A must all lie in the open unit
disk. This completes the proof of the lemma.
Theorem 8.1
Suppose the coefficients of denominator D(z) in (8.23) minimize (8.28a), the
transfer function H(z) = N (z)/D(z) is a stable filter.
Proof
Without loss of generality, we assume that the coordinates for the state space
are chosen so that
⎡ ⎤ ⎡ ⎤
−a1 −a2 · · · −an 1
⎢ 0 ⎥ ⎥ ⎢ ⎥
⎢ ⎢0⎥
A=⎢ ⎥
.. ⎦ , b = ⎢ ⎥ (8.39)
⎣ I n−1 . ⎣ ... ⎦
0 0
where (A, b) is a controllable pair. Form (8.28c), it follows that
Kij (m − 1, n − 1) = Ki+1,j+1 (m, n) for i, j = 0, 1, · · · , n − 1 (8.40)
⎡ ⎤
K11 (m, n) K12 (m, n) · · · K1n (m, n)
⎢ K (m, n) K (m, n) · · · K (m, n) ⎥
⎢ 21 22 2n ⎥
K(m − 1, n − 1) = ⎢ .. .. .. .. ⎥ (8.41)
⎣ . . . . ⎦
Kn1 (m, n) Kn2 (m, n) · · · Knn (m, n)
where
K(m − 1, n − 1) =
⎡ ⎤
K00 (m − 1, n − 1) K01 (m − 1, n − 1) ··· K0,n−1 (m − 1, n − 1)
⎢ K10 (m − 1, n − 1) K11 (m − 1, n − 1) ··· K1,n−1 (m − 1, n − 1) ⎥
⎢ ⎥
⎣ .. .. ..
.
.. ⎦
. . .
Kn−1,0 (m − 1, n − 1) Kn−1,1 (m − 1, n − 1) · · · Kn−1,n−1 (m − 1, n − 1)
By post-multiplying (8.41) by AT in (8.39) and using (8.35), we obtain

⎡ ⎤
K10 (m, n) K11 (m, n) · · · K1,n−1 (m, n)
⎢ K (m, n) K (m, n) · · · K ⎥
⎢ 20 21 2,n−1 (m, n) ⎥
K(m − 1, n − 1)AT = ⎢ .. .. .. .. ⎥
⎣ . . . . ⎦
Kn0 (m, n) Kn1 (m, n) · · · Kn,n−1 (m, n)
(8.42)
Moreover, pre-multiplying (8.42) by A in (8.39) and using (8.35) produces
AK(m − 1, n − 1)AT =
⎡ ⎤
K00 (m, n) K01 (m, n) ··· K0,n−1 (m, n)
⎢ K (m, n) K11 (m, n) ··· ⎥
K1,n−1 (m, n)
⎢ 10 ⎥
⎢ . .. .. ⎥ − αmn bbT
..
⎣ .. . . ⎦.
Kn−1,0 (m, n) Kn−1,1 (m, n) · · · Kn−1,n−1 (m, n)
(8.43)
By virtue of (8.28c), we have
Kij (m, n) = Ki,j (m − 1, n − 1) − fm−i fm−j for i, j = 0, 1, · · · , n − 1

(8.44)
which leads to
⎡ ⎤
K00 (m, n) K01 (m, n) ··· K0,n−1 (m, n)
⎢ K10 (m, n) K11 (m, n) ··· K1,n−1 (m, n) ⎥
⎢ ⎥
⎢ .. .. .. .. ⎥ = K(m−1, n−1)−L
⎣ . . . . ⎦
Kn−1,0 (m, n) Kn−1,1 (m, n) · · · Kn−1,n−1 (m, n)
(8.45)
where ⎡ ⎤
fm
⎢ f ⎥
⎢ m−1 ⎥
L=⎢ . ⎥ fm fm−1 · · · fm−n+1
⎣ .. ⎦
fm−n+1
K(m − 1, n − 1) = AK(m − 1, n − 1)AT + αmn bbT + L (8.46)
Clearly, the symmetric matrix K(m, n) is positive definite provided that

αmn > 0. Since the symmetric matrix K(m−1, n−1) is the lower right n×n
portion of K(m, n), it must also be positive definite. In addition, (A, b) is a
controllable pair and the symmetric matrix L is positive semidefinite. Thus,
based on Lemma 8.2, (8.46) guarantees that all the eigenvalues of A exist
inside the unit disk. It is noted that the eigenvalues of the companion matrix
A in (8.39) are the roots of the polynomial D(z) = det(zI n − A) = 0. This
completes the proof of the theorem.
8.3.3 An Efficient Algorithm for Solving (8.35)

We now present an algorithm that computes solutions to (8.35) efficiently.
The algorithm provides not only an optimal solution but also a means of
determining filter order n that falls in a reasonable range. In addition, unlike
(8.36), the optimal vector a is calculated without matrix inversion. We shall
focus on the case m = n, then treat the case m = n using a variant of the
algorithm derived for the case of m = n. For simplicity, let K(n) = K(n, n)
and αn = αnn . In this case, (8.28b) and (8.28c) can be expressed as
⎡ ⎤
K00 (n) K01 (n) · · · K0n (n)
⎢ K (n) K (n) · · · K (n) ⎥
⎢ 10 11 1n ⎥
K(n) = ⎢ . .. .. .. ⎥
⎣ .. . . . ⎦
Kn0 (n) Kn1 (n) · · · Knn (n) (8.47)
n

Kij (n) = r|i−j| − fk−i fk−j for i, j = 0, 1, · · · , n
k=max{i,j}
respectively. From (8.47), it follows that

Kij (n + 1) = Kij (n) − fn+1−i fn+1−j
(8.48)
Ki+1,j+1 (n + 1) = Kij (n) for i, j = 0, 1, · · · , n
which are equivalent to
⎡ ⎤
K00 (n + 1) K01 (n + 1) · · · K0n (n + 1)
⎢ K (n + 1) K (n + 1) · · · K (n + 1) ⎥
⎢ 10 11 1n ⎥
⎢ . . . . ⎥
⎣ .. .. .. .. ⎦
Kn0 (n + 1) Kn1 (n + 1) · · · Knn (n + 1)
⎡ ⎤
fn+1
⎢ f ⎥
⎢ n ⎥
= K(n) − ⎢ . ⎥ fn+1 fn · · · f1
⎣ .. ⎦
f1
⎡ ⎤
K11 (n + 1) K12 (n + 1) · · · K1,n+1 (n + 1)
⎢ K (n + 1) K22 (n + 1) ⎥
· · · K2,n+1 (n + 1)
⎢ 21 ⎥
⎢ . . .. .. ⎥ = K(n)
⎣ .. .. . . ⎦
Kn+1,1 (n + 1) Kn+1,2 (n + 1) · · · Kn+1,n+1 (n + 1)
(8.49)
respectively. Hence
⎡ ⎤ ⎡ ⎤
rn+1 fn+1
⎢ K(n) rn ⎥ ⎢ ⎥
⎢ ⎥ ⎢ fn ⎥
K(n + 1) = ⎢ .. ⎥ − ⎢ . ⎥ fn+1 fn · · · f0
⎣ . ⎦ ⎣ .. ⎦
rn+1 rn · · · r0 f0

dn0 d(n)T
=
d(n) K(n)
(8.50)
where
T
d(n) = dn1 dn2 · · · dn,n+1
n+1

dnl = rl − fk fk−l for l = 0, 1, · · · , n + 1
k=l
We now define three vectors

⎡ ⎤ ⎡ ⎤ ⎡ ⎤
a0 (n) p0 (n) q0 (n)
⎢ a (n) ⎥ ⎢ p (n) ⎥ ⎢ q (n) ⎥
⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥
a(n) = ⎢ . ⎥ , p(n) = ⎢ . ⎥ , q(n) = ⎢ . ⎥
⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦
an (n) pn (n) qn (n)
(8.51)
such that
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
αn 0 fn
⎢0⎥ ⎢ .. ⎥ ⎢f ⎥
⎢ ⎥ ⎢ ⎥ ⎢ n−1 ⎥
K(n)a(n) = ⎢ . ⎥ , K(n)p(n) = ⎢ . ⎥ , K(n)q(n) = ⎢ . ⎥
⎣ .. ⎦ ⎣0⎦ ⎣ .. ⎦
0 1 f0
(8.52)
where a0 (n) = 1. Using (8.50), it can be proved that
a(n) 0 0
a(n + 1) = − βn − γn
0 p(n) q(n)
0 1
p(n + 1) = − a(n + 1)θ̃n
p(n) αn+1
0 1
q(n + 1) = − a(n + 1)φ̃n
q(n) αn+1 (8.53)
αn+1 = αn − θ̃n βn − φ̃n γn
yield
⎡ ⎤
αn+1
⎢ 0 ⎥
⎢ ⎥
K(n + 1)a(n + 1) = ⎢ .. ⎥
⎣ . ⎦
0
⎡ ⎤ ⎡ ⎤
0 fn+1
⎢ .. ⎥ ⎢ fn ⎥
⎢ ⎥ ⎢ ⎥
K(n + 1)p(n + 1) = ⎢ . ⎥ , K(n + 1)q(n + 1) = ⎢ .. ⎥
⎣0⎦ ⎣ . ⎦
1 f0
(8.54)
where
βn = rn+1 rn · · · r1 a(n)

γn = − fn+1 fn · · · f1 a(n)
θ̃n = d(n)T p(n), φ̃n = d(n)T q(n) − fn+1

The numbers βn and γn are the errors in the predicted values of rn+1
and fn+1 , respectively, based on the nth approximation. If both of these
numbers vanish then the (n + 1)st approximation is equivalent to the
nth one.
To examine the intermediate variables in (8.53), we define

pn (n) qn (n)
D(n) =
qn (n) δn

δn = 1 + fn fn−1 · · · f0 q(n) (8.55)
θn βn
= D(n)
φn γn
By noting that
qn (n) = q(n)T K(n)p(n)
(8.56)
= fn fn−1 · · · f0 p(n)
it is easy to show that both θ̃n − θn and φ̃n − φn exclude the variables rn+1
and fn+1 , and thus are independent of rn+1 and fn+1 . Moreover, by using
(8.52) and (8.56), matrix D(n) in (8.55) can be expressed as
0 0 0 ··· 0 1
D(n) = + p(n) q(n)
0 1 fn · · · f1 f0
⎡ ⎤
0 fn
(8.57)
0 0 0 ··· 0 1 ⎢ .. .. ⎥
⎢ ⎥
= + K(n)−1 ⎢ . . ⎥
0 1 fn · · · f1 f0 ⎣0 f1 ⎦
1 f0
hence D(n) is positive definite so long as αn > 0, i.e., det K(n) = 0.

Alternatively, from (8.55) αn+1 in (8.53) can be written as

θn θ̃n − θn
αn+1 = αn − βn γn − βn γn
φn φ̃n − φn
(8.58)
βn θ̃n − θn
= αn − βn γn D(n) − βn γn
γn φ̃n − φn
We now show that θ̃n = θn and φ̃n = φn by the method of “proof by

contradiction”. Suppose either θ̃n = θn or φ̃n = φn holds, then from (8.54)
appropriate rn+1 and fn+1 could be chosen such that

βn 1 −1 θ̃n − θn
= − D(n) (8.59)
γn 2 φ̃n − φn
which would imply that

βn
αn+1 − αn = βn γn D(n) > 0, or equivalently αn+1 > αn
γn
(8.60)
that contradicts with the fact that αn+1 ≤ αn must holds for
consistent values of rn+1 and fn+1 which satisfy a certain Cayley-
Hamilton’s Theorem indirectly. Hence it can be concluded that θ̃n = θn
and φ̃n = φn .
φn = −q0 (n + 1)αn+1
T
= −q(n + 1)T αn+1 0 · · · 0
(8.61)
= −q(n + 1)T K(n + 1)a(n + 1)

= − fn+1 fn · · · f0 a(n + 1)
By virtue of (8.55), (8.53) and (8.61), we obtain

δn+1 = 1 + fn+1 fn · · · f0 q(n + 1)

0 1
= 1 + fn+1 fn · · · f0 − a(n + 1)φn
q(n) αn+1
φ2
= δn + n
αn+1
(8.62)
An efficient algorithm for solving (8.35) for the case of m = n can now be
summarized as follows.
Initialization:
a(0) = a0 (0) = 1
α0 = r0 − f02
p(0) = p0 (0) = 1/α0 (8.63)
q(0) = q0 (0) = f0 /α0
δ0 = 1 + f02 /α0
Recursion:
βn = rn+1 rn · · · r1 a(n)

γn = − fn+1 fn · · · f1 a(n)
θn pn (n) qn (n) βn
=
φn qn (n) δn γn
θn
αn+1 = αn − βn γn
φn
φ2n
δn+1 = δn +
αn+1
a(n) 0 0
a(n + 1) = − βn − γn
0 p(n) q(n)
0 (8.64)
θn
p(n + 1) = − a(n + 1)
p(n) αn+1
0 φn
q(n + 1) = − a(n + 1)
q(n) αn+1
where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
a0 (n) p0 (n) q0 (n)
⎢ a (n) ⎥ ⎢ p (n) ⎥ ⎢ q (n) ⎥
⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥
a(n) = ⎢ . ⎥ , p(n) = ⎢ . ⎥ , q(n) = ⎢ . ⎥
⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦
an (n) pn (n) qn (n)
This process continues until
|αn+1 − αn | < ε (8.65)
is satisfied where ε > 0 is a prescribed tolerance. If the recursion is terminated

at step n + 1, we set a = a(n + 1) and claim it to be a solution.
Once the denominator coefficient vector a is found, the numerator
coefficients b0 , b1 , · · · , bm can be readily determined from (8.27).
It is known [4] that the recursive algorithm shown in (8.63)–(8.65) can be
applied to the case where m = n by performing a certain number of left or
right shifts to the original impulse response.
8.4 Least-Squares Design

The time-domain design of IIR digital filters involves obtaining the coefficients
of an nth-order transfer function of the form
b0 + b1 z −1 + · · · + bn z −n
H(z, a, b) =
1 + a1 z −1 + · · · + an z −n
∞ (8.66)

−k
= h0 + hk (a, b)z
k=1
where
a = [a1 a2 · · · an ]T , b = [b0 b1 · · · bn ]T , h0 = b0
With the N terms of the impulse response in (8.66), an N × 1 vector is defined
as ⎡ ⎤
h1 (a, b)
⎢ h2 (a, b) ⎥
⎢ ⎥
h(a, b) = ⎢ .. ⎥ (8.67)
⎣ . ⎦
hN (a, b)
An error vector between the desired impulse response fi ’s and the actual one
hi (a, b)’s is then defined by
e(a, b) = f − h(a, b) (8.68)
where f is defined in (8.16), and h0 is chosen as h0 = f0 . We seek the

coefficient vectors a and b of the transfer function H(z, a, b) which minimize
J(a, b) = e(a, b)T e(a, b) = ||e(a, b)||2 (8.69)
provided that 2n ≤ N .
To this end, we write the two equations in (8.8) as
b H1 1
= (8.70)
0 H2 a
where it is assumed that m = n and

⎡ ⎤ ⎡ ⎤
h0 0 ··· 0 hn+1 hn ··· h1
⎢ ..
. .. ⎥
. ⎢hn+2 hn+1 ··· h2 ⎥
⎢ h1 h0 ⎥ ⎢ ⎥
H1 = ⎢ . . ⎥, H 2 = ⎢ .. .. .. .. ⎥
⎣ .. .. . .. 0 ⎦ ⎣ . . . . ⎦
hn hn−1 · · · h0 hN hN −1 · · · hN −n
with hi ’s replacing terms hi (a, b).

Hence, if the vectors a and h(a, b) are known, the vector b can be
determined directly from the upper portion of (8.70), i.e.,
1
b = H1
a
The lower partition of (8.70) can be used to find a linear estimate of vector a.
Replacing the unknown terms of h(a, b) in matrix H 2 by the given terms of
f produces a linear equation error as
1
d(a) = F 2 = Ga − g (8.71)
a
where
⎡ ⎤
fn+1 fn ··· f1
⎢fn+2 fn+1 ··· f2 ⎥
⎢ ⎥
F 2 = ⎢ .. .. .. .. ⎥
⎣ . . . . ⎦
fN fN −1 · · · fN −n
⎡ ⎤ ⎡ ⎤
fn fn−1 ··· f1 fn+1
⎢ fn+1 fn ⎥
··· f2 ⎢fn+2 ⎥
⎢ ⎥ ⎢ ⎥
G = ⎢ .. .. ⎥,
.. .. g = − ⎢ .. ⎥
⎣ . . ⎦ . . ⎣ . ⎦
fN −1 fN −2 · · · fN −n fN
Differentiating ||d(a)||2 = d(a)T d(a) with respect to vector a and setting it

to null yields
∂||d(a)||2
= 2 GT Ga − GT g = 0 (8.72)
∂a
which leads to
a = (GT G)−1 GT g (8.73)
This can be used below as an initial estimate a(0) in the iterative algorithm to
obtain the suboptimal value of vector a.
Now, (8.71) can be rearranged in the form
d(a) = A(a)f (8.74)
where A(a) is an (N − n) × N matrix defined by

⎡ ⎤
an · · · a1 1 0 ··· 0
⎢ . . .. ⎥
⎢ 0 an · · · a1 1 . .⎥
A(a) = ⎢ .. ⎥
⎣ .
..
.
..
.
.. ..
. . 0⎦
0 · · · 0 an · · · a1 1
By substituting vector f in (8.68) into (8.74), we obtain
d(a) = A(a)h(a, b) + A(a)e(a, b) (8.75)
Note that the lower partition of (8.70) can be rearranged in the form
1
H2 = A(a)h(a, b) = 0 (8.76)
a
Hence (8.75) is reduced to
d(a) = A(a)e(a, b) (8.77)
The expression in (8.77) relates the linear equation error d(a) to the error
e(a, b).
We now proceed to develop an iterative design procedure. We begin by
writing an inverse relation of (8.77) between the errors as
e(a, b∗ ) = W (a)d(a) (8.78)
where e(a, b∗ ) is the error vector corresponding to the optimal numerator

coefficients b∗ for a given a, and the N × n matrix W (a) is a function of a.
The projection theorem in [8] states that for a given a, the error e(a, b∗ )
corresponding to an optimal b∗ is orthogonal to h(a, b). The expression in
(8.76) means that the column vectors of A(a)T are orthogonal to h(a, b).
Also, they are linearly independent and span the entire N -dimensional space.
Hence e(a, b∗ ) can be expressed as a linear combination of the columns of
A(a)T , that is,
e(a, b∗ ) = A(a)T γ(a) (8.79)
where γ(a) is an (N − n) × 1 vector. Since (8.77) holds for all e(a, b) which
includes e(a, b∗ ), substituting (8.79) into (8.77) gives
d(a) = A(a)A(a)T γ(a) (8.80)
Evidently, matrix A(a)A(a)T is nonsingular. Thus, from (8.80) it follows

that −1
γ(a) = A(a)A(a)T d(a) (8.81)
Substituting (8.81) into (8.79) and comparing it with (8.78), W (a) can be
expressed as
−1
W (a) = A(a)T A(a)A(a)T (8.82)
As a result, the problem of minimizing (8.69) is now converted into
min ||e(a, b)||2 = min ||W (a)d(a)||2 (8.83)

a,b a
where d(a) is the linear equation error given by (8.71). By letting
W (a(k−1) )d(a(k) ) = W (a(k−1) )(Ga(k) − g) (8.84)
we compute
∂||W (a(k−1) )d(a(k) )||2

(k)
= 2 GT W (a(k−1) )T W (a(k−1) ) Ga(k) − g
∂a
(8.85)
Setting (8.85) to null, we obtain
−1
a(k) = GT W (a(k−1) )T W (a(k−1) )G GT W (a(k−1) )T W (a(k−1) )g
(8.86)
This iteration process continues until

(k−2) (k−1) 2 (k−1) (k) 2
||W (a )d(a )|| − ||W (a )d(a )|| <ε (8.87)
is satisfied where ε > 0 is a prescribed tolerance. When the above iteration

algorithm using (8.86) is complete, the resulting a(k) is deemed to be the
suboptimal solution.
At this point, the second phase of the algorithm begins in order to identify
a stationary point of ||e(a, b)||. By (8.83), it follows that
||e(a, b)||2 = ||W (a)d(a)||2 = d(a)T W (a)T W (a)d(a) (8.88)
which leads to
∂||e(a, b)||2 ∂d(a)T ∂W (a) T
=2 W (a)T + d(a) W (a)d(a)
∂ai ∂ai ∂ai

= 2 eTi GT W (a)T + li (a)T W (a)(Ga − g)
(8.89)
where ai and ei denote the ith element of n × 1 vector a and the ith column
of n × n identity matrix I n , respectively, and
∂W (a)
li (a) = d(a)
∂ai
8.5 Design Using State-Space Models 195
Using (8.82), we have

W (a)A(a)A(a)T = A(a)T (8.90)
which leads to
∂W (a) ∂A(a)T ∂A(a) ∂A(a)T
= − W (a) A(a)T − W (a)A(a)
∂ai ∂ai −1 ∂ai ∂ai
· A(a)A(a)T
(8.91)
where ∂A(a)/∂ai is simply the matrix A(a) with unity’s replacing each
and every ai and setting the rest of the components to zero. The derivatives
involved in computing li (a) in (8.89) are calculated in the same way.
It is clear from (8.89) that
∂||e(a, b)||2
= 2 GT W (a)T + L(a)T W (a)(Ga − g)
∂a (8.92)

= 2U (a)(Ga − g)
where
L(a) = l1 (a), l2 (a), · · · , ln (a)

U (a) = GT W (a)T + L(a)T W (a)
By letting
U (a(k−1) )(Ga(k) − g) = 0 (8.93)
we obtain −1
a(k) = U (a(k−1) )G U (a(k−1) )g (8.94)
||a(k) − a(k−1) ||2 < ε (8.95)
is satisfied for a prescribed tolerance ε > 0. As the iterate ak converges to a
vector a, the gradient of ||e(a, b)|| with respect to a and b is expected to be
practically zero. A step-by-step summary of the algorithm is given below.
Given a set of FIR filter coefficients f0 , f1 , · · · , fN
Sept 1: Find an initial estimate a(0) using (8.73).
Sept 2: Continue the iteration process in (8.86) until (8.87) is satisfied.
Sept 3: Continue the iteration process in (8.94) until (8.95) is satisfied.
Sept 4: Determine e(a∗ , b∗ ) from (8.78).
Sept 5: Compute h(a∗ , b∗ ) from (8.68).
Sept 6: Obtain the optimal b∗ from the upper portion of (8.70).
8.5 Design Using State-Space Models

8.5.1 Balanced Model Reduction
An N th-order FIR digital filter is written in the form
F (z) = f0 + F1 (z) (8.96)
where
F1 (z) = f1 z −1 + f2 z −2 + · · · + fN z −N
The FIR digital filter in (8.96) can be represented by a state-space model
(A, b, c, d)N as
x(k + 1) = Ax(k) + bu(k)
(8.97)
where x(k) is an N × 1 state-variable vector, u(k) is a scalar input, y(k) is a
scalar output, and
⎡ ⎤ ⎡ ⎤
0 ··· 0 0 1
⎢ . . .. .. ⎥ ⎢ 0 ⎥
⎢ 1 . . . ⎥ ⎢ ⎥
A=⎢ . . ⎥, b = ⎢ .. ⎥
⎣ .. . . 0 0 ⎦ ⎣ . ⎦
0 ··· 1 0 0

c = f1 f2 · · · fN , d = f0
The Hankel matrix of the filter in (8.96) is written in the form
⎡ ⎤
f1 f2 · · · fN
⎢ ..
..
. ⎥
⎢f . 0⎥
H N,N = ⎢ .2 .. ⎥ (8.98)
⎣ .. f ..
.
. ⎦
N
fN 0 · · · 0
H N,N = U N V N (8.99)
where
⎡ ⎤
c
⎢ cA ⎥
⎢ ⎥
V N = b Ab · · · AN −1 b and U N = ⎢ .. ⎥
⎣ . ⎦
cAN −1
are called the controllability matrix and the observability matrix, respectively.
The Hankel matrix in (8.98) is symmetric, and can be factorized using
eigenvalue-eigenvector decomposition as
H N,N = P ΣP T (8.100)
where Σ and P are N × N diagonal and orthogonal matrices consisting of
the eigenvalues and eigenvectors of H N,N , respectively, and P P T = I N .
Lemma 8.3
For the system (A, b, c, d)N in (8.97), the controllability matrix V N and the
controllability Grammian K c are unit matrices, i.e.,
V N = Kc = IN (8.101)
and the coordinate transformation defined by
1
T = P Σ− 2 (8.102)
will lead to a balanced realization of the system.
Proof
By simply substituting A and b in (8.97) into V N in (8.99) and recalling that
the controllability Grammian is defined by
∞

Kc = Ai bbT (Ai )T
i=0
we arrive at (8.101). From (8.99)-(8.101), it follows that

H N,N = U N V N = U N = P ΣP T (8.103)
Also recalling that observability Grammian is defined by
∞

Wo = (Ai )T cT cAi
i=0
and using (8.103), we obtain

W o = U TN U N = P Σ2 P T (8.104)
Hence, since K c = T −1 K c T −T and W o = T T W o T , (8.102) produces
Kc = W o = Σ (8.105)
This means that if the coordinate transformation is specified by (8.102), the

resulting equivalent system (A, b, c, d)N is balanced where
A = T −1 AT , b = T −1 b, c = cT

Now, assume that
Σ = diag{σ1 , σ2 , · · · , σN } and σn σn+1 (8.106)
where 1 ≤ n < N . The following theorem is helpful to obtain a reduced-order

subsystem which approximates the whole system in a certain sense.
Theorem 8.2
Suppose the Hankel matrix H N,N of an N th-order FIR filter (A, b, c, d)N
in (8.97) is factorized as (8.100), an nth-order reduced balanced system is
equivalent to the subsystem (A11 , b1 , c1 , d)n where
A11 = P T1 AP 1 , b1 = P T1 b, c1 = cP 1
and P 1 is an N × n matrix obtained from the partition

P = P1 P2
Proof
If Σ is partitioned as
Σ1 0
Σ= (8.107)
0 Σ4
where
Σ1 = diag{σ1 , σ2 , · · · , σn }, Σ4 = diag{σn+1 , σn+2 , · · · , σN }
then use of (8.102) yields balanced realization (A, b, c)N as

⎡ 1 ⎤
−1 1
−1
Σ12 P T1 AP 1 Σ1 2 Σ12 P T1 AP 2 Σ4 2
A=⎣ 1
⎦
−1 1
−1
Σ42 P T2 AP 1 Σ1 2 Σ42 P T2 AP 2 Σ4 2
⎡ 1 ⎤ (8.108)
T
Σ1 P 1 b
2
−1 −1
b=⎣ 1 ⎦, c = cP 1 Σ1 2 cP 2 Σ4 2
Σ42 P T2 b
whose nth-order subsystem (A11 , b1 , c1 )n is specified by

1
− 12
A11 = Σ12 P T1 AP 1 Σ1
1
(8.109)
−1
b1 = Σ1 P T1 b,
2
c1 = cP 1 Σ1 2
The transfer function of the nth-order subsystem is described by
H(z) = c1 (zI n − A11 )−1 b1 + d
(8.110)
= c1 (zI n − A11 )−1 b1 + d
This completes the proof of the theorem.
Remark 8.1
Due to the special structure of matrices A, b, and c in (8.97), it can be shown
that
A11 = P (2 : N, 1 : n)T P (1 : N − 1, 1 : n)
(8.111)
b1 = P (1, 1 : n)T , c1 = cP (1 : N, 1 : n)
where P (i : j, k : m) denotes an extraction of matrix P ’s rows from i to j
and its columns from k to m.
A step-by-step summary of the algorithm is given below.
Given a set of FIR digital filter coefficients f0 , f1 , · · · , fN
Sept 1: Construct the Hankel matrix H N,N in (8.98).
Sept 2: Decompose the Hankel matrix H N,N to obtain Σ and P as shown in
(8.100).
Sept 3: Choose a derired order n of approximation according to the magnitudes
of the elements of Σ as shown in (8.106).
Sept 4: Calculate matrices A11 , b1 , and c1 using (8.111).
Sept 5: Convert the state-space parameters A11 , b1 , and c1 into the transfer
function form H(z) where d = f0 (see Faddeev’s formula in
Section 4.3.2).
8.5.2 Stability and Minimality

The controllability and observability Grammians of the balanced realization,
which are both equal to Σ, are given by the unique positive-definite solution
to the Lyapunov equations
T T
AΣA − Σ = −b b (8.112a)
T
A ΣA − Σ = −cT c (8.112b)
Theorem 8.3
For the balanced realization (A, b, c)N in (8.108), we have
||A|| ≤ 1 (8.113)
where ||A|| denotes the spectral norm of matrix A. If Σ has distinct diagonal
entries, then strict inequality holds in (8.113).
Proof
T
By multiplying (8.112a) from the left by A and from the right by A and
adding the result to (8.112b), we obtain
T T T T
A AΣA A − Σ = −(A b b A + cT c) (8.114)
T
Let λ be an eigenvalue of A A and let v be the corresponding eigenvector,
T
i.e., A Av = λv. Multiplying (8.114) from the left by v H and from the right
by v yields
T T
(|λ|2 − 1)v H Σv = −(v H A b b Av + v H cT cv) ≤ 0 (8.115)
Since v H Σv > 0, it follows that
|λ|2 ≤ 1 (8.116)
T
Since λ is an arbitrary eigenvalue of A A, the spectral norm of A is defined
as
T
||A|| = λmax (A A) (8.117)
and by virtue of (8.116), we arrive at (8.113).
Assume that ||A|| = 1. Then, from (8.117) it follows that λ = 1 is an
T T
eigenvalue of A A. Hence matrix A A − I N is singular. Letting V be a
T
basis matrix for the right nullspace of A A − I N , it follows that
T
(A A − I N )V = 0 (8.118)
By multiplying (8.114) from the left by V H and from the right by V and then
using (8.118), we arrive at
T
b AV = 0, cV = 0 (8.119)
Multiplying (8.114) from the right by V and employing (8.118) and (8.119)
gives
T
(A A − I N )ΣV = 0 (8.120)
T
This means that ΣV is in the right nullspace of A A − I N and there exists
such that
a nonsingular matrix Σ

ΣV = V Σ (8.121)
is the restriction of Σ to the space spanned by V , it is possible to
Since Σ
choose V such that Σ is diagonal. By multiplying (8.112a) from the right by
AV and using (8.118), (8.119) and (8.121), we obtain

ΣAV = AV Σ (8.122)
and let ṽ be the corresponding column of
Let σ̃ be any diagonal entry of Σ
V . Then (8.121) and (8.122) provide
Σṽ = σ̃ ṽ, ΣAṽ = σ̃Aṽ (8.123)
Therefore, both ṽ and Aṽ are eigenvectors of Σ corresponding to the

eigenvalue σ̃. If Σ has distinct eigenvalues, then ṽ and Aṽ must be parallel,
i.e., there exists an α such that
Aṽ = αṽ (8.124)

cṽ = 0 (8.125)
It is obvious that (8.124) and (8.125) contradict the assumption that the system
in (8.97) is observable, since (8.98) and (8.99) hold. Hence ||A|| cannot be
equal to unity when Σ has distinct diagonal entries. This completes the proof
of the theorem.
Equation (8.108) shows that the balanced realization (A, b, c)N can be
partitioned as

A11 A12 b1
A= , b= , c = c1 c2 (8.126)
A21 A22 b2
when the controllability and observability Grammians are partitioned as Σ =

Σ1 ⊕ Σ4 . In this case, the Lyapunov equations in (8.112a) and (8.112b) can
be written as
T
A11 A12 Σ1 0 A11 A12 Σ1 0 b1 T T

− =− b1 b2
A21 A22 0 Σ4 A21 A22 0 Σ4 b2
(8.127a)
T
A11 A12 Σ1 0 A11 A12 Σ1 0 cT1
− =− c1 c2
A21 A22 0 Σ4 A21 A22 0 Σ4 cT2
(8.127b)
respectively.
Theorem 8.4
Suppose a system (A, b, c)N in (8.108) is asymptotically stable and either the
controllability Grammian or the observability Grammian is nonsingular and
diagonal, every subsystem is asymptotically stable.
Proof
Assume that the controllability Grammian is nonsingular, diagonal, and equal
to Σ and that the system is partitioned as in (8.126). Then it will be shown
that the subsystem (A11 , b1 , c1 )n is asymptotically stable.
The upper left equation in (8.127a) becomes
T T T
A11 Σ1 A11 + A12 Σ4 A12 − Σ1 = −b1 b1 (8.128)
T
Let λ be an eigenvalue of A11 and let v be the corresponding eigenvector,
T
i.e., A11 v = λv. Multiplying (8.128) from the left by v H and from the right
by v yields
T T
(|λ|2 − 1)v H Σ1 v = −(v H A12 Σ4 A12 v + v H b1 b1 v) ≤ 0 (8.129)
Since v H Σ1 v > 0, it follows that
|λ| ≤ 1 (8.130)
Suppose |λ| = 1, since Σ4 is positive definite, it follows from (8.129) that
v H A12 = 0, v H b1 = 0 (8.131)
Hence

A11 A12 ∗
b1
vH 0 =λ vH 0 , vH 0 = 0 (8.132)
A21 A22 b2
These contradict the assumption that the system in (8.126) is control-

lable. Therefore, |λ| = 1 which means that |λ| < 1, i.e., the subsystem
(A11 , b1 , c1 )n is asymptotically stable. This completes the proof of the
theorem.
Theorem 8.4 states that if the system is balanced, then every subsystem
is asymptotically stable. Assume that σmin (Σ1 ) is the smallest eigenvalue of
Σ1 and σmax (Σ4 ) is the largest eigenvalue of Σ4 .
Theorem 8.5
Let the partitioning in (8.126) be performed so that σmin (Σ1 ) > σmax (Σ4 ).
The subsystem (A11 , b1 , c1 )n is then controllable and observable.
Proof
The upper left equation in (8.127b) becomes
T T
A11 Σ1 A11 + A21 Σ4 A21 − Σ1 = −cT1 c1 (8.133)
Assume that the subsystem (A11 , b1 , c1 )n is not observable. Then there exists
an eigenvalue λ of A11 with corresponding eigenvector v such that
A11 v = λv, c1 v = 0, ||v|| = 1 (8.134)
By multiplying (8.133) from the left by v H and from the right by v, we obtain
T
(1 − |λ|2 )v H Σ1 v = v H A21 Σ4 A21 v (8.135)
Noting that
T
v H Σ1 v ≥ σmin (Σ1 ), v H A21 Σ4 A21 v ≤ ||A21 v||2 σmax (Σ4 )
(8.136)
it follows from (8.135) that
(1 − |λ|2 ) σmin (Σ1 ) ≤ ||A21 v||2 σmax (Σ4 ) (8.137)
From Theorem 8.3 it follows that

v 2
A ≤ 1 ⇐⇒ ||A11 v||2 +||A21 v||2 ≤ 1 ⇐⇒ ||A21 v||2 ≤ 1−|λ|2
0
(8.138)
By applying (8.138) to (8.137), we have
(1 − |λ|2 ) σmin (Σ1 ) ≤ (1 − |λ|2 ) σmax (Σ4 ) (8.139)

From Theorem 8.4, it is clear that the subsystem (A11 , b1 , c1 )n is asymptoti-

cally stable, i.e., 1 − |λ|2 > 0. Hence
σmin (Σ1 ) ≤ σmax (Σ4 ) (8.140)
This contradicts with the assumption of the theorem. Therefore, the subsystem
is observable. Similarly, it can be shown that the subsystem is controllable.

As an example, consider the problem of approximating an impulse response
of the “Gaussian filter”
fi = 0.256322 exp{−0.103203(i − 4)2 }
by an IIR digital filter over a finite interval 0 ≤ i ≤ 20. The impulse response
and the magnitude response of the Gaussian filter, i.e., the 20th-order FIR
digital filter are depicted in Figure 8.2.
8.6.1 Design Based on Extended Pade’s Approximation

A. A Direct Procedure
Using (8.12) and (8.13), the denominator coefficient vector a and the
numerator coefficient vector b of a 3rd-order IIR digital filter were found
to be
a = [−2.084581, 1.657505, −0.499733]T
b = [0.049165, −0.001237, 0.040055, 0.020835]T
Figure 8.2 The Gaussian filter. (a) Its impulse response. (b) Its magnitude response.
respectively. The poles were given by

λ = 0.651023 ± j0.463440 (|λ| = 0.799130), λ = 0.782534
hence the resulting filter is stable. Magnitude response of the 3rd-order IIR
digital filter designed by a direct procedure is depicted in Figure 8.3.
B. A Modified Procedure
Using (8.12) and (8.21) with W = I 21 , the denominator coefficient vector a
and the numerator coefficient vector b of a 3rd-order IIR digital filter were
found to be
a = [−2.084581, 1.657505, −0.499733]T
b = [0.049165, −0.001355, 0.040022, 0.021245]T
respectively, and the poles were naturally the same as those of the above filter.
Magnitude response of the 3rd-order IIR digital filter designed by a modified
procedure is depicted in Figure 8.4.
8.6.2 Design Using Second-Order Information

A. The Use of Solution (8.36) for Given Order n = 3
Using (8.36) and (8.27), the denominator coefficient vector a and the
numerator coefficient vector b of a 3rd-order IIR digital filter were found
to be
a = [−2.084581, 1.657505, −0.499733]T
b = [0.049165, −0.001237, 0.040055, 0.020835]T
Figure 8.3 Magnitude response of a 3rd-order IIR digital filter designed by a direct procedure.
Figure 8.4 Magnitude response of a 3rd-order IIR filter designed by a modified procedure.
respectively. The poles were computed as
λ = 0.651023 ± j0.463440 (|λ| = 0.799130), λ = 0.782534
digital filter designed by using (8.36) and (8.27) is depicted in Figure 8.5.
B. The Use of Efficient Algorithm (8.63)–(8.65) for Solving (8.35)
By choosing ε = 10−3 in (8.65) and using (8.63)–(8.65), it took the algorithm

4 iterations to converge to
T
a(4) = −2.243631, 2.191723, −1.098524, 0.237046
Figure 8.5 Magnitude response of a 3rd-order IIR digital filter designed by (8.36).
and the poles were found to be
λ = 0.459785 ± j0.538302 (|λ| = 0.707934)

λ = 0.662031 ± j0.186280 (|λ| = 0.687739)
This shows that the resulting filter is stable. The numerator coefficient vector
b = [b0 , b1 , b2 , b3 , b4 ]T were then derived from (8.27) as
T
b = 0.049165, −0.009057, 0.050216, 0.018506, 0.009832
Magnitude response of the 4th-order IIR digital filter designed by using (8.63)–
(8.65) and (8.27) is depicted in Figure 8.6.
Detailed numerical results obtained by applying (8.63)–(8.65) are sum-
marized in Table 8.1.
Figure 8.6 Magnitude response of a 4th-order IIR digital filter designed by (8.63)–(8.65)
and (8.27).
Table 8.1 Convergence of the efficient algorithm using 2nd-order information

n+1 αn+1 |αn+1 − αn |
1 209.627568 × 10−4 232.522911 × 10−3
2 18.893975 × 10−4 19.073359 × 10−3
3 1.095412 × 10−4 1.779856 × 10−3
4 0.035643 × 10−4 0.105977 × 10−3
5 0.000589 × 10−4 0.003506 × 10−3
6 0.000004 × 10−4 0.000059 × 10−3
8.6.3 Least-Squares Design

Let a 3rd-order IIR digital filter be designed. An initial estimate was derived
from (8.73) as
a(0) = [−2.084581, 1.657505, −0.499733]T
By choosing ε = 10−8 in (8.87), it took the algorithm in (8.86) 6 iterations to

converge to
a(6) = [−1.813163, 1.234429, −0.313226]T
By choosing ε = 10−8 in (8.95) and continuing with the second phase, it took
the algorithm in (8.94) 7 iterations to converge to
a∗ = [−1.811521, 1.231832, −0.312083]T
The poles were calculated as
λ = 0.573478 ± j0.375137 (|λ| = 0.685277), λ = 0.664565
hence the resulting filter is stable. After determining the error vector e(a∗ , b∗ )
and actual impulse response vector h(a∗ , b∗ ) from (8.78) and (8.68), respec-
tively, (8.70) was used to obtain the optimal numerator coefficient vector b∗
as
b∗ = [0.049165, 0.013344, 0.039997, 0.048732]T
Magnitude response of the 3rd-order IIR digital filter designed by a least-
squares method is depicted in Figure 8.7.
Figure 8.7 Magnitude response of a 3rd-order IIR digital filter designed by a least-squares
method.
8.6.4 Design Using State-Space Model (Balanced Model

Reduction)
By eigenvalue-eigenvector decomposition of the Hankel matrix H 20,20 in
(8.98), the eigenvalues of H 20,20 in (8.106) were found to be
⎡ ⎤ ⎡ ⎤
σ 1 σ2 0.95351959 −0.33240485
⎢ σ3 σ4 ⎥ ⎢ 0.07664440 −0.01342993 ⎥
⎢ ⎥ ⎢ ⎥
⎢ σ5 σ6 ⎥ = ⎢ 0.00184360 −0.00019436 ⎥
⎢ ⎥ ⎢ ⎥
⎣ σ7 σ8 ⎦ ⎣ 0.00001498 −0.00000079 ⎦
σ9 σ10 0.00000003 −0.00000000
Using (8.111), the reduced-order state-space model (A11 , b1 , c1 , d)3 was
found to be
⎡ ⎤ ⎡ ⎤
0.863908 0.160275 −0.011029 0.477335
A11 = ⎣ −0.459756 0.594112 0.174003 ⎦ , b1 = ⎣ 0.636658 ⎦
−0.137212 −0.754645 0.356728 0.509285

c1 = 0.455148 −0.211628 0.039034 , d = 0.049165
and the denominator coefficient vector a and the numerator coefficient vector
b were found to be
a = [−1.814749, 1.236860, − 0.314268]T
b = [0.049165, 0.013180, 0.039977, 0.048437]T
respectively. The poles were given by
λ = 0.574196 ± j0.376723 (|λ| = 0.686747), λ = 0.666357
digital filter designed by balanced model reduction is depicted in Figure 8.8.
8.6.5 Comparison of Algorithms’ Performances

The performance of the design algorithms addressed in this chapter as applied
to the above design example is summarized in Table 8.2 where
(f0 −h0 )2 + (f1 −h1 )2 + · · · + (f20 −h20 )2
ε2 = × 100
f02 + f12 + · · · + f20
2
max |fi −hi |

0≤i≤20
ε∞ = × 100
max |fi |
0≤i≤20
Figure 8.8 Magnitude response of a 3rd-order IIR digital filter designed by balanced model
reduction.
Table 8.2 Performance comparison among algorithms

Max. Negative Ripple
Algorithms Order ε2 ε∞ for 0 ≤ i ≤ 20
n=3 8.696952 8.820447 –0.000447

Direct Procedure
n=4 1.455143 1.552236 –0.003978
n=3 8.691246 8.830924 –0.000440
Modified Procedure
n=4 1.455106 1.552335 –0.003978
n=3 8.696952 8.820447 –0.000447
2nd-Order Information
n=4 1.455143 1.552236 –0.003978
n=3 2.501358 2.163644 –0.001453
Least-Squares Design
n=4 0.353882 0.318031 –0.000500
n=3 2.502378 2.136857 –0.001435
Balanced Model Reduction
n=4 0.353895 0.318568 –0.000503
8.7 Summary
In the techniques based on extended Pade’s approximation, the problem of
designing IIR digital filters to approximate a desired impulse response over
a specified interval has been studied. Two design procedures that require
only linear calculations have been illustrated for the approximation of IIR
References 211
digital filters. In the filter design using second-order information, mixed first-
and second-order information in the form of a finite portion of the impulse
response and autocorrelation sequences has been applied to provide an efficient
algorithm for designing IIR digital filters. In the least-squares design, a method
for obtaining the coefficients of an nth-order IIR digital filter, which gives the
optimal least-squares approximation to a desired impulse response over a finite
interval, has been presented. In the filter design using state-space model, an
algorithm, which is based on balanced model reduction, for the approximation
of FIR digital filters by IIR digital filters has been examined. Finally, numerical
experiments have been performed to compare their performances among the
filter design techniques in the time domain.
References
[1] M. S. Bertran, “Approximation of digital filters in one and two dimen-
sions,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-23,
no. 5, pp. 438–443, Oct. 1975.
[2] C. S. Burrus and T. W. Parks, “Time domain design of recursive
digital filters,” IEEE Trans. Audio Electroacoust., vol. AU-18, no. 2,
pp. 137–141, June 1970.
[3] R. Hastings-James and S. K. Mehra, “Extensions of the Pade-
approximant technique for the design of recursive digital filters,”
IEEE Trans. Acoust. Speech, Signal Process., vol. ASSP-25, no. 6,
pp. 501–509, Dec. 1977.
[4] C. T. Mullis and R. A. Roberts, “The use of second-order information in
the approximation of discrete-time linear systems,” IEEE Trans. Acoust.
Speech, Signal Process., vol. ASSP-24, no. 3, pp. 226–238, June. 1976.
[5] T. Hinamoto and S. Maekawa, “Separable-denominator 2-D rational
approximation via 1-D based algorithm,” IEEE Trans. Circuits Syst.,
vol. CAS-32, no. 10, pp. 989–999, Oct. 1985.
[6] R. E. Kalman and J. Bertram, “Control system design via the second
method of Liapunov, part II, discrete time systems,” ASME J. Basic
Engineering, vol. 82, pp. 394–400, 1960.
[7] A. G. Evans and R. Fischl, “Optimal least squares time-domain synthesis
of recursive digital filters,” IEEE Trans. Audio Electroacoust., vol. AU-
21, no. 1, pp. 61–65, Feb. 1973.
[8] D. G. Luenberger, Optimization by Vector Space Method. New York:
Wiley, 1969.
[9] B. Beliczynski, I. Kale and G. D. Cain, “Approximation of FIR by IIR

digital filters: An algorithm based on balanced model reduction,” IEEE
Trans. Signal Process., vol. 40, no. 3, pp. 532–542, Mar. 1992.
[10] L. Pernebo and L. M. Silverman, “Model reduction via balanced state
space representations,” IEEE Trans. Autom. Control, vol. AC-27, no. 2,
pp. 382–387, Apr. 1982.
9
Design of Interpolated and FRM
FIR Digital Filters
9.1 Preview
Interpolated FIR (IFIR) filters [1, 2] and frequency-response-masking (FRM)
FIR filters [3] are well-known classes of computationally efficient digital filters
because the total number of multipliers and adders required to implement
these filters are considerably less than those required by their conventional
counterparts [4]. There has been a great deal of work in the literature following
the aforementioned original development, see for example [5–15] and the
references therein. Because of the importance of these filters, it is naturally
desirable to develop a simple and unifying design method that is applicable to
both of the filter classes. In this chapter, we propose such a design technique
in that the subfilters involved are jointly optimized in the minimax sense.
The core of the proposed design approach is the convex-concave procedure
(CCP) [16–18] that allows the use of efficient convex optimization to deal
with nonconvex design problems where the objective functions assume the
form of difference of two convex functions.
Our focus is on the design of original IFIR and single stage FRM filters so
as for the reader to sense this general design strategy in a simple and transparent
manner. We also explain why the CCP is well-suited that simultaneously
promotes sparsity of filter coefficients for the designs at hand. In addition,
we present a variant of the CCP-based technique for the design of FRM filters
with improved implementation efficiency.
9.2 Basics of IFIR and FRM Filters and CCP

9.2.1 Interpolated FIR Filters
IFIR filters are introduced in [1] and further investigated in [2]. As shown in
Figure 9.1, an IFIR filter is composed of a cascade of two FIR filters, whose
transfer function assumes the form
213
214 Design of Interpolated and FRM FIR Digital Filters
F(zL) M(z)
Figure 9.1 An IFIR filter.
H(z) = F (z L )M (z) (9.1)

where L > 0 is an integer that determines the degree of the filter’s sparsity,
hence its computational efficiency.
Suppose the parent transfer function F (z) represents a lowpass filter with
normalized passband [0, ωp ], transition band [ωp , ωa ], and stopband [ωa , 1].
Then F (z L ) is a periodic filter because the baseband of the frequency response
F (ejLω ) is reduced to (1/L)-th of the baseband of F (z). This is illustrated
in Figure 9.2 where the lowpass filter F (z) possesses a passband [0, 0.2] and
a transition band [0.2, 0.4]. With L = 4, Figure 9.2b depicts the magnitude
response of F (z L ). As expected, the first passband and transition band of
F (z L ) are reduced to [0, 0.2]/L = [0, 0.05] and [0.2 0.4]/L = [0.05 0.1],
respectively. Consequently, the passband of F (z L ) is considerably narrower
than that of F (z) and its magnitude roll-off is much sharper.
Clearly, for F (z L ) to be of use in constructing a lowpass filter with
narrow passband and sharp roll-off, the undesired passbands of the periodic
filter F (z L ) must be suppressed. As can be seen in Figure 9.1, this is
done by connection F (z −L ) in cascade with a lowpass M (z) which is
known as interpolator. As an illustration, Figure 9.3(a) shows the mag-
nitude response of a linear-phase lowpass filter M (z) whose passband is
[0, 0.05] and Figure 9.3(b) displays the magnitude response of the IFIR filter
H(z) = F (z L )M (z).
9.2.2 Frequency-Response-Masking Filters

Another class of FIR filters with very narrow transition bands, known as
FRM filters, is proposed in [3] and has since been a subject of study
[4–15]. As illustrated in Figure 9.4, a single stage FRM filter has a connected
parallel structure where the linear phase periodic filter F (z L ) and its delay
complementary periodic filers, z −L(N −1)/2 − F (z L ), are cascaded with
masking filters Ma (z) and Mc (z), respectively, so as to produce an FIR filter
with sharper transition bands.
The transfer function of the FRM filter in Figure 9.4 is given by
H(z) = F (z L )Ma (z) + [z −L(N −1)/2 − F (z L )]Mc (z) (9.2)

9.2 Basics of IFIR and FRM Filters and CCP 215
0.8
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(a) Normalized frequency
0.8
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(b) Normalized frequency
Figure 9.2 Magnitude response of (a) F (z) and (b) F (z L ) with L = 4.
where N denotes the length of F (z) and is assumed to be an odd integer

throughout. If we denote the complementary filter of F (z) by G(z) =
z −(N −1)/2 − F (z), then (9.2) becomes
H(z) = F (z L )Ma (z) + G(z L )Mc (z) (9.3)
Hence an FRM filter is essentially a two-channel filter bank with an interpo-

lated FIR filter in each channel. Because {F (z), G(z)} is a complementary
pair, their passbands are complementary to each other. Consequently, by
connecting these IFIR filters in parallel with adequately chosen interpolators
Ma (z) and Mc (z), lowpass (as well as other standard types) FRM filers with
0.8
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.8
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 9.3 Magnitude response of (a) M (z) and (b) H(z) = F (z L )M (z).
F(zL) Ma(z)
-
z-L(N-1)/2 + Mc(z) +
Figure 9.4 A single-stage FRM filter.
9.2 Basics of IFIR and FRM Filters and CCP 217
arbitrary passband can be designed. It is this feature that distinguishes FRM

filters from IFIR filters which are limited to narrow-band frequency responses.
9.2.3 Convex-Concave Procedure (CCP)

As will become clear shortly, jointly optimizing all sub-filters involved in an
IFIR or FRM filter is not a convex problem. The CCP is a heuristic method
of convexifying nonconvex problems. It is known that a function f (x) with
continuous second-order derivatives can be expressed as a difference of two
convex functions [23]. If the function in question has a bounded Hessian, such
an expression, namely f (x) = u(x)−v(x), can be explicitly constructed [19].
We are thus motivated to consider a nonconvex problem of the form
minimize f (x) = u0 (x) − v0 (x) (9.4a)
subject to uj (x) ≤ vj (x) for j = 1, 2, · · · , q (9.4b)
where uj (x) and vj (x) for j = 0, 1, · · · , q are convex. The CCP is an iterative
procedure for solving (9.4). In the kth iteration where iterate xk is known,
CCP performs two steps:
(i) convexification of the objective function and constraints at xk by replacing
each vj (x) by its affine approximation
v̂j (x, xk ) = vj (xk ) + ∇vj (xk )T (x − xk )

where ∇ denotes the gradient operator.
(ii) Solve the convex problem
minimize fˆ(x) = u0 (x) − v̂0 (x) (9.5a)
subject to uj (x) ≤ v̂j (x) for j = 1, 2, · · · , q (9.5b)
One can start CCP by solving (9.5) with an initial x0 that is feasible for the
original problem (9.4). This means that uj (x0 ) − vj (x0 ) ≤ 0 for 1 ≤ j ≤ q
which in conjunction with the convexity of vj (x) implies that
uj (x0 ) − v̂j (x0 , x0 ) = uj (x0 ) − vj (x0 ) ≤ 0
hence x0 is also a feasible point for the convex problem in (9.5). Also note
that if xk+1 is produced by solving (9.5) in the kth iteration, then xk+1 is
also feasible for the original problem in (9.4) because the convexity of vj (x)
implies that vj (xk+1 ) ≥ v̂j (xk+1 , xk ) which leads to
uj (xk+1 ) − vj (xk+1 ) ≤ uj (xk+1 ) − v̂j (xk+1 , xk ) ≤ 0
Another desirable property of CCP is that it is a descent method [20], namely

the original objective function, u0 (x)−v0 (x), decreases monotonically, at the
iterates {xk } generated by solving convex problem (9.5). To see this, note that
fk = u0 (xk ) − v0 (xk ) = u0 (xk ) − v̂0 (xk , xk )
(9.6)
≥ u0 (xk+1 ) − v̂0 (xk+1 , xk )
Since v0 (x) is convex, we have v0 (xk+1 ) ≥ v̂0 (xk+1 , xk ), hence
u0 (xk+1 ) − v̂0 (xk+1 , xk ) ≥ u0 (xk+1 ) − v0 (xk+1 ) = fk+1
which in conjunction with (9.6) gives fk+1 ≤ fk . It follows that if the objective
is bounded from below, then {fk } converges to a finite limit. As expected, the
convergence of CCP-produced iterates {xk } cannot be assured in general
because it deals with noncovex problems as a heuristic method after all.
Nevertheless, it can be shown that under mild conditions {xk } converges
to critical points of the original problem [21]. The CCP may be terminated in
several ways, including by a certain number of iterations or by monitoring if
the difference in objective function between two consecutive iterations is less
than a given convergence tolerance.
9.3 Minimax Design of IFIR Filters

9.3.1 Problem Formulation
Following (9.1), let the transfer functions of the parent filter and interpolator
of an IFIR filter be given by
N
−1 N
i −1
F (z) = fn z −n , M (z) = mn z −n (9.7)

n=0 n=0
respectively, and the frequency response of an IFIR filter is given by

H(ejω ) = F (ejLω )M (ejω ) (9.8)
Assume both F (z) and M (z) are of linear phase response, the zero-phase
frequency response of the IFIR filter is given by
H0 (af , am , ω) = [aTf tf (Lω)] · [aTm tm (ω)] (9.9)
where af and am are coefficient vectors determined by the impulse responses
of F (z) and M (z), respectively, tf (ω) and tm (ω) are vectors with trigono-
metric components determined by the filter lengths and types (e.g. odd or even
9.3 Minimax Design of IFIR Filters 219
length and symmetrical or antisymmetrical filter coefficients). In the case of

both F (z) and M (z) being of odd length, for example, we have
⎡ ⎤ ⎡ ⎤
f(N −1)/2 m(Ni −1)/2
⎢2f(N +1)/2 ⎥ ⎢2m(N +1)/2 ⎥
⎢ ⎥ ⎢ i ⎥
af = ⎢ .. ⎥ , a m = ⎢ .. ⎥
⎣ . ⎦ ⎣ . ⎦
2fN −1 2mNi −1
⎡ ⎤ ⎡ ⎤
1 1
⎢ cos ω ⎥ ⎢ cos ω ⎥
⎢ ⎥ ⎢ ⎥
tf (ω) = ⎢ .. ⎥, tm (ω) = ⎢ .. ⎥
⎣ . ⎦ ⎣ . ⎦
cos[(N − 1)ω/2] cos[(Ni − 1)ω/2]
where fn ’s and mn ’s are from (9.7).
Let Hd (ω) be the desired zero-phase response, the frequency-weighted
minimax design of an IFIR filter amounts to finding vectors af and am that
solves the problem
min max w(ω)|H0 (af , am , ω) − Hd (ω)| (9.10)
af ,am ω∈Ω
where w(ω) > 0 is a frequency-selective weight over a frequency domain
of interest, Ω. Evidently, problem (9.10) is nonconvex with respect to design
variables xT = [aTf aTm ].
9.3.2 Convexification of (9.10) Using CCP

By introducing an upper bound δ for the objective function in (9.10) and
treating the bound as an additional design variable, problem (9.10) becomes
minimize δ (9.11a)
subject to [aTf tf (Lω)][aTm tm (ω)] ≤ δw + Hd (ω) (9.11b)
−[aTf tf (Lω)][aTm tm (ω)] ≤ δw − Hd (ω) (9.11c)
for ω ∈ Ω, where δw = δ/w(ω). Bearing CCP in mind, we add the term
0.5s(x, ω) with
s(x, ω) = (aTf tf (Lω))2 + (aTm tm (ω))2 (9.12)
to both sides of (9.11b) and (9.11c) to obtain an equivalent pair of constrains
as
[aTf tf (Lω) + aTm tm (ω)]2 ≤ s(x, ω) + 2δw + 2Hd (ω) (9.13a)
[aTt tf (Lω) − aTm tm (ω)]2 ≤ s(x, ω) + 2δw − 2Hd (ω) (9.13b)

Clearly, the functions on both sides of (9.13a) and (9.13b) are convex with
respect to x and δ, therefore (9.13) is suitable for application of CCP. We
proceed by replacing s(x, ω) in (9.13) by its linearization at xk , namely
s(xk , ω) + ∇s(xk , ω)T (x − xk ). This gives
[aTf tf (Lω) + (−1)i aTm tm (ω)]2 ≤ ηi (x, xk , ω), i = 0, 1 (9.14)
with
ηi (x, xk , ω) = s(xk , ω) + ∇s(xk , ω)T (x − xk ) + 2δw + (−1)i 2Hd (ω)
where for practical purposes ω is taken from Ωd = {ωj | j = 1, 2, · · · , K} ⊂

Ω, a finite discrete set of frequency grids that are sufficiently dense and placed
uniformly over the frequency region of interest Ω.
In summary, the convex problem to be solved in the kth iteration of CCP
is given by
minimize δ (9.15a)
subject to [aTf tf (Lωj ) + (−1)i aTm tm (ωj )]2 ≤ ηi (x, xk , ωj ) (9.15b)
for i = 0, 1 and j = 1, 2, · · · , K
where ωj ∈ Ωd . It is well known that this problem of minimizing a
linear function subject to convex quadratic constraints can be formulated
as a semidefinite programming (SDP) or second-order cone programming
(SOCP) problem [24], which can be solved efficiently [25–27]. The CCP-
based algorithm may be terminated after a given number of iterates {xk }
have been generated or, when the difference between two consecutive error
bounds, namely |δk−1 − δk |, is less than a given convergence tolerance. In
either case, the last iterate produced from (9.15) is taken to be a solution of
the design problem. A step-by-step description of the proposed algorithm is
given below as Algorithm 1.
Algorithm 1 for IFIR filters
Step 1: input x0 , (N, Ni ), Ωd , Hd (ω) for ω ∈ Ωd , w(ω), and Ki
Step 2: for k = 0, 1, · · · , Ki − 1
(i) solve (9.15) for af and am
(ii) construct xk+1 = [aTf aTm ]T
end
Step 3: output x∗ = xKi
9.3 Minimax Design of IFIR Filters 221
9.3.3 Remarks on Convexification in (9.13)–(9.14)

Note that expressing a nonconvex function as f (x) = u(x) − v(x) is not
unique. In fact if f (x) = u(x) − v(x) holds with u(x) and v(x) convex,
then f (x) = ũ(x) − ṽ(x) also holds with ũ(x) = u(x) + w(x) and ṽ(x) =
v(x) + w(x) where w(x) is an arbitrary convex function, hence both ũ(x)
and ṽ(x) remain convex. A natural question arising from this observation is
how this non-uniqueness affects the CCP. In CCP, a nonconvex constraint
u(x) ≤ v(x) is replaced by
u(x) ≤ v(xk ) + ∇v(xk )T (x − xk ) (9.16)
Now if we treat the above nonconvex constraint as u(x) + w(x) ≤ v(x) +

w(x) with a nonlinear convex w(x) (here we assume w(x) is nonlinear,
because adding a linear w(x) does not affect the CCP at all) and apply CCP,
the constraint would be replaced by
u(x) + w(x) ≤ v(xk ) + ∇v(xk )T (x − xk )

+ w(xk ) + ∇w(xk )T (x − xk )
i.e.,
u(x) + e(x, xk ) ≤ v(xk ) + ∇v(xk )T (x − xk ) (9.17)
where, due to the convexity of w(x), e(x, xk ) = w(x) − w(xk ) −
∇w(xk )T (x − xk ) is convex and always nonnegative. On comparing (9.17)
with (9.16), we note that a point x that satisfies constraint (9.17) also satisfies
constraint (9.16), but the converse does not hold (unless w(x) is a linear
function which make e(x, xk ) vanish). In other words, adding a redundant
convex component w(x) to the decomposition f (x) = u(x) − v(x) shrinks
the feasible region in a CCP-based method, hence imposing the risk of losing
good solution candidates.
With above analysis in mind, we now examine the convexification steps
made in Section 9.3.2. For illustration purposes, here we focus on the treatment
of constraint (9.11b) since the same analysis also applies to (9.11c). By writing
(9.11b) as
1 T T 0 tf (Lω)tm (ω)T af
[af am ] ≤ δw + Hd (ω) (9.18)
2 tm (ω)tf (Lω)T 0 am
we see that the left-hand side of (9.18) is a quadratic function with an indefinite
Hessian, hence nonconvex. Therefore, constraint (9.11b) does not fit into the
form in (9.4b), and an adequate convex term needs to be added to both sides of
(9.18) to make CCP applicable. Intuitively, the quadratic expression in (9.18)
suggests to add the term
1 T T tf (Lω)tf (Lω)T 0 af
[a a ] (9.19)
2 f m 0 tm (ω)tm (ω)T am
which itself is convex and precisely equal to 0.5s(x, ω) (see (9.12) for the
definition of s(x, ω)). In doing so, (9.18) is led to
1 T T tf (Lω)tf (Lω)T tf (Lω)tm (ω)T af

[a a ]
2 f m tm (ω)tf (Lω)T tm (ω)tm (ω)T am
1
≤ s(x, ω) + δw + Hd (ω) (9.20)
2
where the functions on both sides are convex, hence fitting nicely into (9.4b).
It is important to realize that the function on the left-hand side of (9.20)
is convex but not strictly convex because its Hessian is merely a rank-one
matrix, implying that adding anything less than that of (9.19) will not lead to
a formulation suitable for CCP. Clearly, (9.20) is identical to (9.13a), hence
the above explains the convexification steps in Section 9.3.2.
9.4 Minimax Design of FRM Filters

9.4.1 The Design Problem
Following (9.2), let the transfer functions of the periodic filter and two masking
filters of an FRM filter be given by
N
−1 N
a −1 N
c −1
−n
F (z) = fn z , Ma (z) = m(a)
n z
−n
, Mc (z) = m(c)
n z
−n
n=0 n=0 n=0

(9.21)
respectively, the zero-phase frequency response of a single-stage FRM filter
can be expressed as
H(x, ω) = [aTf tf (Lω)][aTa ta (ω) − aTc tc (ω)] + aTc tc (ω) (9.22)
where af , aa , and ac are coefficient vectors associated with filters F (z),

Ma (z) and Mc (z) in Figure 9.4 and tf (ω), ta (ω) and tc (ω) are respective
trigonometric vectors determined by the lengths and types of the FIR filters
9.4 Minimax Design of FRM Filters 223
involved. Given a desired zero-phase frequency response Hd (ω), a frequency-

weighted minimax design seeks to find variable vector x = [aTf aTa aTc ]T
that solves the problem
min max w(ω)|H(x, ω) − Hd (ω)| (9.23)

x ω∈Ω
where w(ω) > 0 is a frequency-selective weight defined over a frequency
region of interest, Ω. From (9.22), it is evident that (9.23) is a nonconvex
problem with respect to design variable x.
9.4.2 A CCP Approach to Solving (9.23)

By bounding the objective in (9.23) from above by δ and treating the bound
as an additional design variable, we arrive at
minimize δ (9.24a)
subject to H(x, ω) − δw − Hd (ω) ≤ 0, ω∈Ω (9.24b)

−H(x, ω) − δw + Hd (ω) ≤ 0, ω∈Ω (9.24c)
where δw = δ/w(ω). We now take an approach similar to that of Section 9.3.2
to reformulate (9.24). By adding the convex term
v(x, ω) = [aTf tf (Lω)]2 + 12 [aTa ta (ω)]2 + 21 [aTc tc (ω)]2 (9.25)
to both sides of (9.24b) and (9.24c), we obtain an equivalent pair of constraints

as
u1 (x, ω) ≤ v(x, ω) (9.26a)
u2 (x, ω) ≤ v(x, ω) (9.26b)
where
1
u1 (x, ω) = (aTfa P 0 af a + aTfc Q1 af c ) + aTc tc (ω) − δw − Hd (ω) (9.26c)
2
and
1
u2 (x, ω) = (aTfa P 1 af a + aTfc Q0 af c ) − aTc tc (ω) − δw + Hd (ω) (9.26d)
2
with
af af
af a = , af c =
aa ac
are convex with respect to x and δ because their Hessian matrices are
characterized by positive semidefinite blocks P i = pi pTi and Qi = q i q Ti
with
tf (Lω) tf (Lω)
pi = , qi = for i = 0, 1.
(−1)i ta (ω) (−1)i tc (ω)
Consequently, the constraints in (9.26a) and (9.26b) fit into the form of (9.4b).
By applying CCP to (9.26), we obtain a pair of convex constraints
u1 (x, ω) ≤ ṽ(x, xk , ω)
u2 (x, ω) ≤ ṽ(x, xk , ω)
where
ṽ(x, xk , ω) = v(xk , ω) + ∇T v(xk , ω)(x − xk )
with ⎡ ⎤
2(aTf tf (Lω))tf (Lω)
⎢ ⎥
∇v(x, ω) = ⎣ (aTa ta (ω))ta (ω) ⎦
(aTc tc (ω))tc (ω)
where for practical purposes ω is taken from a finite and dense frequency grids
Ωd = {ωj | j = 1, 2, · · · , K} ⊂ Ω. In summary, the convex problem to be
solved in the kth iteration of CCP is given by
minimize δ (9.27a)
subject to u1 (x, ωj ) ≤ ṽ(x, xk , ωj ), 1 ≤ j ≤ K (9.27b)

u2 (x, ωj ) ≤ ṽ(x, xk , ωj ), 1 ≤ j ≤ K (9.27c)
Like the case of IFIR filter design, this problem of minimizing a linear function
subject to convex quadratic constraints can be formulated as an SDP or SOCP
problem [24], which can be solved efficiently [25–27].
Finally, we remark that the Hessian of functions u1 (x, ω) in (9.27b) and
u2 (x, ω) in (9.27c) are of rank-two and positive semidefinite, but not positive
definite as can be seen from (9.26c) and (9.26d). Therefore, an argument
similar to that in Section 9.3.3 can be applied to conclude that constraints
(9.27b) and (9.27c) define largest feasible region relative to other CCP-
compatible options as discussed in Section 9.3.3. A step-by-step summary
of the algorithm is given below.
9.5 FRM Filters with Reduced Complexity 225
Algorithm 2 for FRM filters

Step 1: input x0 , (N, Na , Nc ), Hd (ω) for ω ∈ Ωd , w(ω), and Ki
Step 2: for k = 0, 1, · · · , Ki − 1
solve (9.27) for xk+1
end
9.5 FRM Filters with Reduced Complexity

FRM filters are widely considered computationally efficient [3, 4, 9]. Variants
of FRM filters with further reduced complexity have also been proposed in the
literature. Unlike these variants, in this section we propose a CCP algorithm
which is actually a modified version of that developed in Section 9.4 for the
design of FRM filters that simultaneously promotes sparsity of the impulse
responses of the subfilters involved so as to reduce implementation complexity.
Here an FIR filter H1 (z) is said to more sparse than filter H2 (z) of same length,
if the impulse response of H1 (z) contains more zero entries than that of H2 (z).
The algorithm proposed below consists of two phases:
9.5.1 Design Phase 1

The aim of phase 1 is to identify the locations (i.e. indices) of filter coefficients
that may be set to zero without substantially affecting filter’s performance.
Motivated by the fact that for most large underdetermined systems of linear
equations the minimal l1 -norm solution is also the sparsest solution [28, 29],
we propose to promote the sparsity of an FRM filter by solving a modified
version of (9.27) where the objective function combines upper bound with a
weighted l1 -norm of design variable x, namely,
minimize δ + μx1 (9.28a)
subject to u1 (x, ωj ) ≤ ṽ(x, xk , ωj ), 1 ≤ j ≤ K (9.28b)

u2 (x, ωj ) ≤ ṽ(x, xk , ωj ), 1 ≤ j ≤ K (9.28c)
where weight μ > 0 controls the trade-off between error bound δ and sparsity
of filter coefficients. Once the solution xs = [aTf aTa aTc ]T of problem (9.28)
is obtained, a prescribed threshold ε > 0 is used to identify an index set
Io = {Iaf , Iaa , Iac } as follows:
Iaf = {i : |af (i)| ≤ ε}, Iaa = {i : |aa (i)| ≤ ε}, Iac = {i : |ac (i)| ≤ ε}
(9.29)
Concerning the selection of parameters μ and ε, in principle they should be
small: μ should be small because otherwise the solution of (9.28) becomes
less relevant to the design objective which is to minimize the error bound
δ; and ε should be small because the use of a large ε would set some
intrinsically nonzero coefficients to zero, hence inevitably do harm to the
filter performance. We stress that the goal of phase 1 is merely to identify an
index set to nullify the associated filter coefficients, and it is a collective action
of μ and ε that identifies an appropriate index set: the introduction of term
μx1 promotes the sparsity in x, and the size of the index set is controlled
by threshold ε.
9.5.2 Design Phase 2

The design now proceeds with a second phase in that the remaining nonzero
entries of the impulse responses are optimally tuned by minimizing the same
error bound as in (9.27a), but subject to additional constraints that the filter
coefficients with indices in set {Iaf , Iaa , Iac } are equal to zero. Namely, we
solve
minimize δ (9.30a)
subject to u1 (x, ωi ) ≤ ṽ(x, xk , ωi ) for i = 1, 2, · · · , K (9.30b)
u2 (x, ωi ) ≤ ṽ(x, xk , ωi ) for i = 1, 2, · · · , K (9.30c)
af (i) = 0 for i ∈ Iaf (9.30d)
aa (i) = 0 for i ∈ Iaa (9.30e)
ac (i) = 0 for i ∈ Iac (9.30f )
Since the additional constraints are all linear equalities, problem (9.30)
remains convex and can be solved efficiently. The algorithm proposed above
is outlined below.
Algorithm 3 for sparse FRM filters
Step 1: input x0 , (N, Na , Nc ), Ωd , Hd (ω) for ω ∈ Ωd , w(ω), μ, ε,
and Ki
Step 2: for k = 0, 1, · · · , Ki − 1
solve (9.28) to obtain x = [aTf aTa aTc ]T
apply (9.29) to x to obtain Iaf , Iaa , and Iac
solve (9.30) to obtain xk+1
end
9.6 Design Examples 227
9.6 Design Examples

We now present several design examples to illustrate the design methods
proposed above.
9.6.1 Design and Evaluation Settings

A common goal of the designs presented below is to construct a linear-
phase FIR digital system with narrow transition band, whose implementation
efficiency is achieved via an IFIR or FRM structure. We follow the convention
to evaluate the performance of each design in terms of peak-to-peak passband
ripple Ap (dB) and minimum stopband attenuation Aa (dB) [4] which are
defined by
pmax
Ap = 20 log10
pmin
and
Aa = −20 log10 (amax )
where pmax and pmin denote the maximum and minimum |H(ejω )| over
passband, respectively, and amax denotes the maximum |H(ejω )| over
stopband.
As a part of performance evaluation, each design is compared with one or
more designs made by existing algorithms from the literature. In addition, these
designs are compared with their counterparts obtained using conventional FIR
structure, see Section 9.6.4 below.
9.6.2 Design of IFIR Filters

Example 9.1
The algorithm proposed in Section 9.3 was applied to design a lowpass IFIR
filter with normalized passband edge ωp = 0.15π, stopband edge ωa = 0.2π.
The sparsity factor was set to L = 4, and orders of F (z) and M (z) were set
to 31 and 17, respectively. The frequency weight w(ω) was set to w(ω) ≡ 1
for ω in the passband and w(ω) ≡ 2 for ω in the stopband. An initial x0 =
[aTf0 aTm0 ]T was generated by the standard technique proposed in [3]. A total

of K = 1400 frequency grids were uniformly placed in [0, ωp ] [ωa , π] to
form the discrete set Ωd for problem (9.15). It took the algorithm 91 iterations
to converge to an IFIR filter with Ap = 0.0317 dB and Aa = 60.84 dB.
The same design problem was addressed as Example 10.29 in [4] using the
method described in [30]. The method was implemented as function ifir in
the Signal Processing Toolbox of MATLAB. With [F,M] = ifir(4,low,
[0.15, 0.2], [0.002, 0.001], advanced), the function returns with

optimized impulses of filter F (z) of order 31 and M (z) of order 17 (in
Example 10.29 of [4], the order of M (z) was said to be 16, however the
order of M (z) produced by the above MATLAB code was actually 17),
with Ap = 0.0340 dB and Aa = 60.18 dB. The passband and stopband
amplitude responses of the two designs are depicted as solid and dashed lines
in Figures 9.5a and 9.5b, respectively.
0.03
Proposed
Method of [30], [4 ]
0.025
0.02
0.015
0.01
0.005
−0.005
−0.01
−0.015
−0.02
0 0.05 0.1 0.15
−58
Proposed
Method of [30], [4 ]
−60
−62
−64
−66
−68
−70
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 9.5 Amplitude response (in dB) of the IFIR filters for Example 9.1 by the proposed
algorithm (solid line) and the method of [30] and [4] (dashed line) in (a) passband and (b)
stopband.
Example 9.2
Reference [9] presents an example (Example 10.2) where a highpass IFIR filter
with sampling frequency ωs = 16000 Hz, stopband edge ωa = 6600 Hz and
passband edge ωp = 7200 Hz is designed. In the design, the interpolator
is fixed to M (z) = (1 − z −1 )4 and sparsity factor is set to L = 2
while a linear-phase subfilter F (z) of order 20 is optimized in minimax
sense. The performance of the IFIR filter is given by Ap = 0.9061 dB and
Aa = 40.78 dB.
As a follow-up of the above example, the algorithm proposed in Section 9.3
was applied to jointly optimize an F (z) of order 20 and an M (z) of order 4 for a
highpass IFIR filter H(z) = F (z L )M (z) with the same design specifications
as Example 10.2 of [9]. The frequency weight w(ω) was set to w(ω) ≡ 1
for ω in the passband and w(ω) ≡ 4.5 for ω in the stopband. An initial
x0 = [aTf0 aTm0 ]T was generated by the standard technique proposed in [3].
total of K = 1400 frequency grids were uniformly placed in [0, 6600 Hz]
A
[7200 Hz, 8000 Hz] to form the discrete set Ωd for problem (9.15). It took
the algorithm 14 iterations to converge to an IFIR filter with Ap = 0.6914 dB
and Aa = 41.07 dB. The amplitude responses of the two designs in the entire
baseband and over its passband are depicted respectively as solid and dashed
lines in Figures. 9.6a and 9.6b.
For the sake of a more efficient implementation, M (z) is normalized
by dividing it with the coefficient of its constant term, m0 , so that both
coefficients of the constant term and the 4th-order term become unity. The
transfer function F (z L ) is rescaled to m0 F (z L ) so that the overall transfer
function H(z) remains unaltered. The first eleven coefficients of F (z) and
first three coefficients of M (z) after the normalization are shown in Table 9.1.
We stress that implementing the interpolator M (z) in Example 10.2 of [9]
requires no multiplications, therefore the performance gain of the proposed
design over that of [9] was achieved at a cost of two more multiplications per
output.
9.6.3 Design of FRM Filters

Example 9.3
The algorithm proposed in Section 9.4 was applied to design a lowpass FRM
filter with the same design specifications as in the first example in [3] and
[10]. The normalized passband and stopband edges were ωp = 0.6π and
ωa = 0.61π. The sparsity factor was set to L = 9, and orders of F (z), Ma (z),
and Mc (z) were 44, 40, and 32, respectively. A trivial weight w(ω) ≡ 1 was
Proposed
10 Method of [9 ]
−10
−20
−30
−40
−50
−60
−70
0 1000 2000 3000 4000 5000 6000 7000 8000
(a) Frequency in Hz
0.5
Proposed
Method of [9]
0.4
0.3
0.2
0.1
−0.1
−0.2
−0.3
−0.4
−0.5
7200 7300 7400 7500 7600 7700 7800 7900 8000
(b) Frequency in Hz
Figure 9.6 Amplitude response (in dB) of the IFIR filters in Example 9.2 by the proposed
algorithm (solid line) and the method of [9] (dashed line) in (a) entire baseband and (b) passband.
utilized. With K = 1100, it took the algorithm 87 iterations to converge to an

FRM filter with Ap = 0.1320 dB and Aa = 42.49 dB, which are favorably
compared with those achieved in [3] (Ap = 0.1792 dB and Aa = 40.96
dB), which has been a benchmark for FRM filters, and those reported in [11]
(Ap = 0.1348 dB and Aa = 42.25 dB). The amplitude response of the FRM
filter in the entire baseband and passband is shown in Figure 9.7.
Example 9.4
The algorithm proposed in Section 9.4 was applied to design a lowpass FRM
filter with the same passband, stopband, and sparsity factor as in Example 9.3,
Table 9.1 Coefficients of F (z) and M (z) for Example 9.2

fi for i = 0, 1, 2, · · · , 10 mi for i = 0, 1, 2
0.002287418334499 1
0.001299308145379 −2.439354581245607
0.001489820681472 2.965711835392111
−0.001928211344492
−0.004121111010416
−0.005691224464356
−0.002031403630543
0.005260711732820
0.015907596775075
0.024282544840202
0.028106104367025
10
−10
−20
−30
−40
−50
−60
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1
0.08
0.06
0.04
0.02
−0.02
−0.04
−0.06
−0.08
−0. 1
0 0.1 0.2 0.3 0.4 0.5 0.6
Figure 9.7 Amplitude response (in dB) of the FRM filters in Example 9.3 in (a) entire
baseband and (b) passband.
i.e., ωa = 0.6π, ωp = 0.61π, and L = 9. However, the orders of F (z), Ma (z),

and Mc (z) were reduced to 42, 36, and 28, respectively. With w(ω) ≡ 2.1 and
K = 1000, it took the algorithm 600 iterations to converge to an FRM filter
with Ap = 0.2600 dB and Aa = 43.01 dB. The amplitude response of the
FRM filter in the entire baseband and passband is shown in Figure 9.8. The
above design specifications coincide with a design presented in [16] based on
a neural network approach. From the numerical results proved in Tables 1–3
of [16], it was found that Ap = 0.2652 dB (instead of 0.0672 dB as reported
in [16] as there was a deep notch at passband edge 0.6π) and Aa = 42.42 dB.
10
−10
−20
−30
−40
−50
−60
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.15
0.1
0.05
−0.05
−0. 1
0 0.1 0.2 0.3 0.4 0.5 0.6

Figure 9.8 Amplitude response (in dB) of the FRM filter in Example 9.4 in (a) entire baseband
and (b) passband.
Example 9.5
The algorithm proposed in Section 9.4 was also applied to design a lowpass
FRM filter with orders of F (z), Ma (z), and Mc (z) being (56, 30, 24),
L = 7, and normalized passband and stopband edges being ωp = 0.65π
and ωa = 0.66π, respectively. With weight w(ω) ≡ 1 and K = 2400, it took
the algorithm 94 iterations to converge to an FRM filter with Ap = 0.1510
dB and Aa = 41.22 dB. The amplitude response of the FRM filter in the
entire baseband and passband is shown in Figure 9.9. For comparison, in
[10] lowpass FRM filters with the same passband and stopband edges and
10
−10
−20
−30
−40
−50
−60
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1
0.08
0.06
0.04
0.02
−0.02
−0.04
−0.06
−0.08
−0. 1
0 0.1 0.2 0.3 0.4 0.5 0.6
interpolation factor L are designed using a weighted least-squares Chebyshev

approach. With orders of F (z), Ma (z), and Mc (z) being (56, 32, 26), the
performance of the FRM filters are reported in terms of Ap = 0.1960 dB
and Aa = 40.11 dB for a filter named Filter 4 and Ap = 0.1920 dB and
Aa = 40.44 dB for a filter named Filter 5.
Example 9.6
The algorithm described in Section 9.5 was applied to re-design the FRM
filter presented in Example 9.5 so as to reduce its complexity while maintain a
comparable performance. With weight w(ω) ≡ 1, K = 1000, and μ = 0.05,
phase 1 of the design was carried out using 30 iterations of (9.28) to yield a
coefficient vector xs = [aTf aTa aTc ]T . With a threshold ε = 0.0006, the
three index sets defined in (9.29) were identified as Iaf = {13, 18, 25},
Iaa = {11, 15, 16}, Iac = {11, 13}, indicating there are a total of eight
coefficients that may be set to zero values without substantially degrading the
filter’s performance provided that the remaining coefficients are optimized in
the second phase of the design. Proceeding with design phase 2 with weight
w(ω) ≡ 1 and K = 2400, it took 10 iterations of (9.30) to converge to
an FRM filter with Ap = 0.1629 dB and Aa = 40.56 dB, in that eight
coefficients whose indices are identified above have been set to zero. The
amplitude response of the FRM filter in the entire baseband and passband is
shown in Figure 9.10.
9.6.4 Comparisons with Conventional FIR Filters

The six designs presented above are compared with their conventional
counterparts that are linear-phase equiripple FIR filters designed using the
Parks-McClellan (P-M) algorithm [4] with practically the same performance
as those in Examples 9.1–9.6 in terms of respective Ap and Aa . The compar-
isons are made in terms of computational complexity measured by the number
of multiplications required per output sample and the overall group delay of
the filter, see Table 9.2.
It is observed from Table 9.2 that in all design instances the IFIR and
FRM filters designed by the proposed algorithms offer considerably improved
computational efficiency at the cost of slightly increased group delay.
9.7 Summary
We have proposed a unified approach based on CCP to the design of minimax
IFIR and FRM filters. The design method is conceptually simple and produces
designs that are shown to provide satisfactory performance relative to those
9.7 Summary 235
10
−10
−20
−30
−40
−50
−60
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1
0.08
0.06
0.04
0.02
−0.02
−0.04
−0.06
−0.08
−0. 1
0 0.1 0.2 0.3 0.4 0.5 0.6
Table 9.2 Comparisons with conventional FIR filters

IFIR/FRM Filters P-M FIR Filters [4]
Design Multiplications Group Delay Multiplications Group Delay
1 25 70.5 65 64
2 13 22 22 21
3 60 218 210 209
4 56 207 193 192
5 58 211 202 201
6 50 211 198 197
available from the literature. In addition, the proposed design method is

shown to allow extension to the design of FRM filters that simultaneously
promotes sparsity of filter coefficients to reduce implementation complexity
without substantially degrading filter’s performance. Design examples have
been presented to illustrate the proposed design algorithms that compare
favorably with several known design techniques. Extensions of CCP-based
designs to multistage FRM filters is possible, but developing a CCP type of
convexification to maintain largest feasible region turns out to be challenging.
References
[1] Y. Neuvo, C. Y. Dong, and S. K. Mitra, “Interpolated finite impulse
response filters,” IEEE Trans. Acoust., Speech, Signal Process., vol.
ASSP-32, no. 3, pp. 563–570, June 1984.
[2] T. Saramäki, Y. Neuvo, and S. K. Mitra, “Design of computationally
efficient interpolated FIR filters,” IEEE Trans. Circuits Syst., vol. CAS-
35, no. 1, pp. 70–88, Jan. 1988.
[3] Y. C. Lim, “Frequency-response masking approach for the synthesis of
sharp linear phase digital filters,” IEEE Trans. Circuits Syst., vol. CAS-
33, no. 4, pp. 357–364, Apr. 1986.
[4] S. K. Mitra, Digital Signal Processing – A Computer-Based Approach,
3rd ed., McGraw Hill, 2006.
[5] Y. C. Lim and Y. Lian, “The optimum design of one- and two-dimensional
FIR filters using the frequency response masking technique,” IEEE
Trans. Circuits Syst. II, vol. 40, no. 2, pp. 88–95, Feb. 1993.
[6] Y. C. Lim and Y. Lian, “Frequency-response masking approach for digital
filter design: Complexity reduction via masking filter factorization,”
IEEE Trans. Circuits Syst. II, vol. 41, no. 8, pp. 518–525, Aug. 1994.
[7] T. Saramäki, Y. C. Lim, and R. Yang, “The synthesis of half-band filter
using frequency-response masking technique,” IEEE Trans. Circuits
Syst. II, vol. 42, no. 1, pp. 58–60, Jan. 1995.
[8] T. Saramäki and H. Johansson, “Optimization of FIR filters using
frequency-response masking approach,” in Proc. IEEE Int. Symp.
Circuits Syst., vol. 2, pp. 177–180, Sydney, Australia, May 2001.
[9] P. S. R. Diniz, E.A. B. da Silva, and S. L. Netto, Digital Signal Processing,
Cambridge University Press, 2002.
[10] L. C. R. de Barcellos, S. L. Netto, and P. S. R. Diniz, “Optimization of
FRM filters using the WLS-Chebyshev approach,” Circuits, Syst., Signal
Process., vol. 22, no. 2, pp. 99–113, Mar. 2003.
References 237
[11] W.-S. Lu and T. Hinamoto, “Optimal design of frequency-response-

masking filters using semidefinite programming,” IEEE Trans. Circuits
Syst. I, vol. 50, no. 4, pp. 557–568, Apr. 2003.
[12] T. Saramäki, J. Yli-Kaakinen, and H. Johansson, “Optimization of
frequency-response masking based FIR filters,” J. Circuits, Syst., Com-
put., vol. 12, pp. 563–589, May 2003.
[13] W.-S. Lu and T. Hinamoto, “Optimal design of frequency-response-
masking filters using second-order cone programming,” in Proc. IEEE
Int. Symp. Circuits Syst., vol. 3, pp. 878–881, Bangkok, Thailand, May
2003.
[14] Y. Liu and Z. Lin, “Optimal design of frequency-response masking filters
with reduced group delays,” IEEE Trans. Circuits Syst. I, vol. 55, no. 6,
pp. 1560–1570, July 2008.
[15] Y. Wei and D. Liu, “Improved design of frequency-response masking
filters using band-edge shaping filter with non-periodical frequency
response,” IEEE Trans. Signal Process., vol. 61, no. 13, pp. 3269–3278,
July 2013.
[16] X.-H. Wang and Y.-G. He, “Aneural network approach to FIR filter design
using frequency-response masking technique,” Signal Process., vol. 88,
pp. 2917–2926, 2008.
[17] J. Yli-Kaakinen and T. Saramäki, “An efficient alorithm for the optimiza-
tion of FIR filters synthesized using the multistage frequency-response
masking approach,” Circuits, Syst., Signal Process., vol. 30, no. 1,
pp. 157–183, 2011.
[18] Y. Wei, S. Huang, and X. Ma, “A novel approach to design low-cost two-
stage frequency-response masking filters,” IEEE Trans. Circuits Syst. II,
vol. 62, no. 10, pp. 982–986, Oct. 2015.
[19] A. L. Yuille and A Rangarajan, “The concave-convex procedure,” Neural
Computation, vol. 15, no. 4, pp. 915–936, 2003.
[20] T. Lipp and S. Boyd, “Variations and extensions of the convex-concave
procedure,” Research Report, Stanford University, Aug. 2014.
[21] B. K. Sriperumbudur and G. R. Lanckriet, “On the convergence of
the concave-convex procedure,” in Advances in Neural Information
Processing Systems, pp. 1759–1767, 2009.
[22] W.-S. Lu and T. Hinamoto, “A unified approach to the design of interpo-
lated and frequency-response-masking FIR filters,” IEEE Trans. Circuits
Syst. I, vol. 63, no. 12, pp. 2257–2266, Dec. 2016.
[23] P. Hartman, “On functions representable as a difference of convex
functions,” Pacific J. of Math., vol. 9, no. 3, pp. 707–713, 1959.
[24] A. Antoniou and W.-S. Lu, Practical Optimization: Algorithms and

Engineering Applications, Springer, 2007.
[25] J. F. Sturm, “Using SeDuMi 1.02, a MATLAB toolbox for optimization
over symmetric cones,” Optimization Methods and Software, vol. 11–12,
pp. 625–633, 1999.
[26] R. H. Tütüncü, K. C. Toh, and M. J. Todd, “Solving semidefinite-
quadratic-linear programs using SDPT3,” Mathematical Programming,
Series B, vol. 9, pp. 189–217, 2003.
[27] M. Grant, S. Boyd, and Y. Ye, “Disciplined convex programming,”
in Global Optimization: from Theory to Implementation, Nonconvex
Optimization and Its Applications, L. Liberti and N. Maculan, eds.,
Springer, 2006.
[28] D. L. Donoho, “For most largest underdetermined systems of linear
equations the minimal l1 -norm solution is also the sparsest solution,”
Comm. Pure Applied Math., vol. 59, no. 6, pp. 797–829, June 2006.
[29] E. J. Candès and M. B. Wakin, “An introduction to compressive sam-
pling,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 21–30, Mar.
2008.
[30] T. Saramäki, “Finite impulse response filter design,” in Handbook
for Digital Signal Processing, S. K. Mitra and J. F. Kaiser eds.,
Wiley-Interscience, New York NY, 1993.
10
Design of a Class
of Composite Digital Filters
10.1 Preview
A composite filter (C-filter) refers to a digital filtering system that is composed
of explicit individual modules, often called subfilters, which are connected in
cascade or parallel, or a mixture of both. Well known classes of C-filters
include interpolated FIR filters [1, 2], frequency-response-masking filters
[3], and their variants, see e.g. [4]. Over the years, analysis and design of
C-filters have attracted a great deal of research interest primarily because
of their ability to offer improved computational efficiency relative to their
conventional counterparts when appropriate subfilter structures are chosen
and their parameters are optimized in accordance with certain design criterion
[5]. Recently, C-filters composed of a prototype filter and a shaping filter,
connected in cascade, are shown to offer certain advantages over conventional
counterparts [6, 7]. In the case of FIR C-filters [6], the shaping filter is
constructed by cascading several complementary comb filters (CCFs) of the
form (1 + z −l )kl with integers 1 ≤ l ≤ L and kl ≥ 0. As such, the shaping
filter only requires several adders and memory units to implement and is free
of multiplications. Yet, as demonstrated in [6], the shaping filter is capable
of effectively improving filter performance, especially for those with narrow
transition bands and highly suppressed stopbands.
In this chapter, we present a new algorithm for the design of C-filters
that assume the same form H(z) = Hp (z)Hs (z) as in [6], however pro-
totype filter Hp (z) as well as shaping filter Hs (z) are designed through
different approaches in order for H(z) to achieve equiripple passbands and
least-squares stopbands (EPLSS). EPLSS FIR filters, especially with narrow
passbands, are desirable in wireless communications, aerospace systems, and
synthetic aperture radar [8]. The design algorithm proposed in this chapter
uses a sequential optimization technique to deal with the design of Hp (z)
239
240 Design of a Class of Composite Digital Filters
and Hs (z) separately, yet these two design steps are coupled by alternately
performing them. A strong motivation to develop such a design method is that
both design steps can be formulated as convex problems hence can be solved
efficiently. Numerical examples are presented to illustrate the design method
and evaluate its performance.
10.2 Composite Filters and Problem Formulation

10.2.1 Composite Filters
We consider a C-filter that is composed of two subfilters, known as prototype
filter Hp (z) and shaping filter Hs (z) respectively, that are connected in cascade
as shown in Figure 10.1. Thus the transfer function of the C-filter assumes the
form
H(z) = Hp (z) · Hs (z) (10.1)

In this chapter, we examine a class of linear-phase FIR C-filters where the
prototype filter is an FIR filter of length N with transfer function
N
−1
Hp (z) = hn z −n (10.2)
n=0
which largely determines the characteristics of H(z), while the shaping filter
is a CCF of the form
L
Hs (z) = (1 + z −l )kl (10.3)
l=1
where kl are nonnegative integers, which reshapes the prototype filter
for improved performance without substantial increase in implementation
complexity.
For clarity of presentation, in the rest of the chapter we focus on linear-
phase lowpass FIR C-filters whose frequency response can be expressed as
H(ω) = Hp (ω)Hs (ω) with
Figure 10.1 A composite filter.

10.2 Composite Filters and Problem Formulation 241
N
−1
Hp (ω) = hn e−jnω = e−jτp ω [xT c(ω)] (10.4)
n=0
where τp = (N − 1)/2 is the group delay of Hp (z), x is a coefficient vector

determined by the impulse response of Hp (z), and c(ω) is a vector with
trigonometric components, and
L
L
lω
Hs (ω) = (1 + e−jlω )kl = e−jτs ω (2 cos )kl (10.5)
2
l=1 l=1
where
L
1
τs = l · kl
2
l=1
Therefore, the zero-phase frequency response of the C-filter is given by
L
lω
A(ω) = Ap (ω) · As (ω) = xT c(ω) (2 cos )kl (10.6)
2
l=1
and the group delay of the C-filter is given by τ = τp + τs .

The behavior of shaping filter Hs (z) is determined by the number L of
CCFs used and the power kl for each CCF. As long as kl > 0, the first notch
of a single CCF (1 + z −l )kl over the normalized baseband [0, π] occurs at
ω = π/l, see for example Figure 10.2(a) for (1 + z −4 ) (a scaling factor 0.5
has been used in the figure to normalize the filter gain at ω = 0 to unity).
By cascading several CCFs with appropriate powers kl , a shaping filter can
offer much reduced stopband energy while retaining decent passband gain. As
an example, Figure 10.2(b) depicts the amplitude response of a shaping filter
Hs (z) with L = 4 and {kl , l = 1, 2, 3, 4} = {4, 4, 1, 2} where the filter gain
at ω = 0 has been normalized to unity.

Given a desired lowpass frequency response Hd (ω), prototype filter length
N , and number of CCF’s L, we seek to find a linear-phase FIR C-filter of
form (10.1–10.3) such that the peak-to-peak amplitude ripple in passband
is minimized subject to constraints on filter’s energy as well as peak gain in
stopband and total group delay τ . Note that the constraints imposed here would
lead to an enhanced EPLSS C-filter in which largest peak in the stopband is
under control and, in addition, the group delay is bounded so as to ensure its
Figure 10.2 Amplitude response of (a) 1 + z −4 and (b) (1 + z −1 )4 (1 + z −2 )4 (1 + z −3 )1

(1 + z −4 )2 .
utility for on-line applications. The design objective is achieved by solving

the constrained problem
min max |H(ω) − Hd (ω)| (10.7a)

x,y ω∈Ωp
π
subject to: |H(ω)|2 dω ≤ ea (10.7b)
ωa
max |H(ω)| ≤ δa (10.7c)
ω∈Ωa
10.3 Design Method 243
L
1
l · kl ≤ D (10.7d)
2
l=1
where Ωp and Ωa denote the passband and stopband, respectively, ωa denotes
the stopband edge, x from (10.4) and y = [k1 k2 . . . kL ]T are design variables,
ea , δa and D are constants representing upper bounds for stopband energy,
peak filter gain in stopband, and group delay of the shaping filter, respectively.
Note that the constraint in (10.7d) implies an upper bound (N − 1)/2 + D for
the total group delay of the C-filter.
10.3 Design Method

10.3.1 Design Strategy
From (10.6), we see that design variables x and y are separate from each
other. Moreover, frequency response A(ω) depends on x linearly, but on y
highly nonlinearly. In addition, all components of variable y are constrained
to be nonnegative integers. Under these circumstances, it is intuitively natural
to optimize these design variables separately in an alternate fashion. This
leads to a sequential procedure where in each step one of the variables, say
x, is optimized while the other variable, y, is held fixed, and the solution so
produced, say xk , is held fixed in the next step when variable y is updated to
y k+1 . The alternating optimization continues until a stopping criterion is met
and the last pair of solutions (y K , xK ) is taken to be the optimal design. The
rest of this section is devoted to describing the technical details involved in
the design procedure.
10.3.2 Solving (10.7) with y Fixed to y = y k

With a fixed that satisfies (10.7d), constraint (10.7d) can be neglected. By
introducing an upper bound δp for the objective function as an auxiliary
variable, the problem at hand becomes
min δp (10.8a)
subject to : |H(ω) − Hd (ω)| ≤ δp for ω ∈ Ωp (10.8b)

π
|H(ω)|2 dω ≤ ea (10.8c)
ωa
|H(ω)| ≤ δa for ω ∈ Ωa (10.8d)

If the desired frequency response assumes the form Hd (ω) = e−jτ ω Ad (ω),
then constraints in (10.8b–10.8d) are reduced to
|xT ĉ(ω) − Ad (ω)| ≤ δp for ω ∈ Ωp (10.9a)
xT Qx ≤ ea (10.9b)
T
|x ĉ(ω)| ≤ δa for ω ∈ Ωa (10.9c)
where
L
lω
ĉ(ω) = c(ω) (2 cos )kl
2
l=1
π
Q= ĉ(ω)ĉT (ω) dω
ωa
In realistic implementation, one has to deal with a finite set of constraints

and this is accomplished by replacing sets Ωp and Ωa with their discrete
counterparts Ω̂p = {ωp1 , ωp2 , . . ., ωpK1 } and Ω̂a = {ωa1 , ωa2 , . . ., ωaK2 }
respectively, with the frequency grids placed uniformly over the respective
bands. This simplifies problem (10-8) to
min δp (10.10a)
subject to : |xT ĉ(ω) − Ad (ω)| ≤ δp for ω ∈ Ω̂p (10.10b)

xT Qx ≤ ea (10.10c)
T
|x ĉ(ω)| ≤ δa for ω ∈ Ω̂a (10.10d)
With respect to variables (δp , x), the objective function and constraints
(10.10b) and (10.10d) are linear, while (10.10c) is a convex quadratic con-
straint because Q is positive definite. As a result, (10.10) is a convex problem
whose global solution can be computed using reliable and convenient solvers
[9, 10]. We denote the solution of (10.10) by (δk , xk ).
10.3.3 Updating y with x Fixed to x = xk

Consider the stopband energy given by
π π L

2 lω
J(y) = |H(ω)| dω = |xT c(ω)|2 (4 cos2 )kl dω (10.11)
ωa ωa 2
l=1
where x = xk is obtained by solving (10.10) and is fixed throughout this step.

Our strategy to update y is via minimizing J(y) with respect to y subject to
several relevant constraints on y.
A technical difficulty to utilize continuous optimization to optimize y
is that the components of y, must be integers. This problem is overcome
by extending the kl ’s from the domain of nonnegative integers to that of
nonnegative reals. When an optimizing y with non-integer components is
obtained, an integer solution can be found by rounding its components to
nearest integers.
Noting that for any a > 0, the derivative of f = ax is given by
∂f
= ax log a
∂x
the gradient and Hessian of J(y) can be evaluated by computing
π L k l
∂J(y) ∂ T 2 2 lω
= |x c(ω)| 4 cos dω
∂ki ∂ki ωa 2
l=1
π L
k l k i
T 2 lω 2 ∂ iω
2
= |x c(ω)| 4 cos 4 cos dω
ωa 2 ∂ki 2
l=1,l=i
(10.12a)
π L
k l k i
lω iω
= |xT c(ω)|2 4 cos2 4 cos2
ωa 2 2
l=1,l=i

iω 2
· log 4 cos dω
2
π
2 2 iω
= |H(ω)| log 4 cos dω
ωa 2
and
π L
k l
∂ 2 J(y) ∂2 T 2 lω 2
= |x c(ω)| 4 cos dω
∂ki ∂kj ∂ki ∂kj ωa 2
l=1
π L
k l
lω
= |xT c(ω)|2 4 cos2 (10.12b)
ωa 2
l=1,l=i,j
k i k j
∂2 iω2 jω2
· 4 cos 4 cos dω
∂ki ∂kj 2 2
π L k l k i k j
T 2 2 lω 2 iω 2 jω
= |x c(ω)| 4 cos 4 cos 4 cos
ωa 2 2 2
l=1,l=i,j

iω 2 2 jω
· log 4 cos log 4 cos dω
2 2
π
2 2 iω 2 jω
= |H(ω)| log 4 cos log 4 cos dω
ωa 2 2
respectively. Note that with an arbitrary column vector v of length L
L
L
∂ 2 J(y)
v T ∇2 J(y)v = vi vj
∂ki ∂kj
i=1 j=1
π L
L
2 2 iω
= |H(ω)| log 4 cos
ωa 2
i=1 j=1

jω 2
· log 4 cos vi vj dω
2
π L (10.13)
iω
= |H(ω)|2 log 4 cos2 vi
ωa 2
i=1
⎡ ⎤
L
jω
·⎣ log 4 cos2 vj ⎦ dω
2
j=1
L 2
π iω
= |H(ω)|2 log 4 cos2 vi dω ≥ 0
ωa 2
i=1
hence J(y) is convex although it is a rather complicated function as shown in

(10.11). This motivates a convex quadratic approximation of J(y) as
ˆ y k ) = J(y k ) + dT ∇J(y k ) + 1 dT ∇2 J(y k )dk
J(y, (10.14)
k
2 k
where dk = y − y k . We update y by minimizing J(y, ˆ y k ) subject to several
constraints. These include an upper bound on group delay τ as seen in (10.7d)
and nonnegativeness of the components of y. An additional constraint is
imposed to ensure the performance of the C-filter over the passband, especially
at passband edge ωp :
L
lωp kl
1 − dp ≤ |H(ω)|ω=ωp = |xTk c(ωp )| |2 cos | ≤ 1 + dp (10.15)
2
l=1
with a small dp > 0. This constraint is highly nonlinear with respect to y, but
fortunately it is equivalent to
L
lωp
log(1 − dp ) ≤ c + kl log |2 cos | ≤ log(1 + dp ) (10.16)
2
l=1
with c = log |xTk c(ωp )|, which is linear with respect to y. Based on above
analysis, we propose to update y by solving the convex problem
1
min (y − y k )T ∇2 J(y k )(y − y k ) + (y − y k )T ∇J(y k ) (10.17a)
2
subject to : (10.7d), y ≥ 0, and (10.16) (10.17b)
and then rounding its solution to a nearest integer solution y k+1 .
10.3.4 Summary of the Algorithm

The algorithm described above is outlined as Algorithm 1.
Algorithm 1 for C-Filters
Step 1 input y 0 , (N, L, D), ωp , ωa , δa , ea , dp , and K.

Step 2 for k = 0, 1, . . .
(i) fix y = y k and solve (10.10) for xk ;
(ii) fix x = xk , perform K iterations of (10.17) for y k+1 ;
(iii) if y k = y k+1 , set k = k + 1 and repeat from Step (i), otherwise
go to Step 3.
Step 3 output x∗ = xk , y ∗ = y k .
We remark that because both (10.10) and (10.17) are convex problems,
globally optimal iterates xk and y k can be calculated efficiently.
Since the objective function in (10.17a) represents the filter’s stopband
energy, minimizing it tends to increase some powers in the shaping filter until
the group delay of Hs (z) reaches upper bound D in constraint (10.7d). When
this occurs, iterate y k remains unchanged and Algorithm 1 terminates.
10.4 Design Example and Comparisons

We illustrate the design method proposed above by applying Algorithm 1 to
design a narrow-band lowpass C-filter with linear phase response and sharp
transition band specified by ωp = 0.1π and ωa = 0.11π. The length of Hp (z)
was set to N = 519 and the peak gain of the C-filter in the stopband was set
to be no greater than −60 dB, i.e., δa = 0.001.
The performance of the filters were evaluated in terms of peak-to-peak
passband ripple Ap (in dB), minimum stopband attenuation Aa (in dB),
stopband energy Ea , the number of multiplications M per output sample,
and group delay τ . With L = 7, D = 18, y 0 = [1 1 1 1 1 1 1]T ,
dp = 0.08, and ea = 5 × 10−4 , problem (10.10) was solved and its solution
x0 together with y 0 defines a C-filter achieving Ap = 0.1681, Aa = 60,
and Ea = 2.15 × 10−8 . With x0 held fixed, K = 5 iterations of (10.17)
were performed to obtain an integer solution y 1 = [1 1 1 1 1 1 2]T . Since
y 1 = y 0 , problem (10.10) was solved again with y fixed to y 1 , where ea
was adjusted in order obtain a solution x1 with practically the same peak gain
in the stopband and stopband energy as x0 so that the two iterates x0 and
x1 can be compared with each other in terms of peak passband ripple. With
ea = 2.95 × 10−4 , the C-filter specified by (x1 , y 1 ) achieved Ap = 0.1389,
Aa = 60.04 and Ea = 2.12 × 10−8 . We then run (10.17) again with x fixed
to x1 and K = 5, this yields an integer solution y 2 = [1 1 1 1 1 1 2]T . Since
y 2 = y 1 , the algorithm is terminated and (x1 , y 1 ) is claimed as the solution.
The amplitude responses of Hp (z) and H(z) are shown in Figures 10.3(a) and
10.3(b), respectively, and the amplitude response of H(z) in the passband is
depicted in Figure 10.3(c).
The filters that are most relevant to the C-Filters addressed in this chapter
are linear-phase EPLSS FIR filters with constrained peak gain in stopband and
linear-phase FIR filters with equiripple passbands and stopbands obtained
using the Parks-McClellan (P-M) algorithm. For comparison purposes, an
EPLSS lowpass filter and a P-M lowpass filter with the same ωp and ωa
as those in the C-filter were designed, both satisfy the same peak stopband
gain of −60 dB as the C-filter. The evaluation results are summarized
in Table 10.1.
It is observed that the C-filter outperforms the two conventional filters at
the cost of a slight increase in group delay. Relative to the EPLSS filter,
the C-filter offers reduced passband ripple and requires less number of
multiplications for implementation. On comparing with the P-M filter, the
C-filter also offers smaller passband ripple, reduced number of multiplications,
and considerably smaller stopband energy.
10.4 Design Example and Comparisons 249
Figure 10.3 Amplitude response of (a) the prototype filter Hp (z), (b) the C-filter H(z) and
(c) the C-filter H(z) over the passband.
Table 10.1 Comparisons of C-filter with EPLSS and P-M filters

Filters N Ap Aa Ea M τ
C-filter 519 0.1389 60.04 2.12 × 10–8 260 277.5

EPLSS 551 0.1465 60.03 2.13 × 10−8 276 275
P-M 531 0.1463 60.03 1.36 × 10−6 266 265
10.5 Summary
We have addressed the design of a class of composite filters by an alternating
convex optimization strategy to achieve equiripple passband and least-squares
stopband subject to peak-gain constraint. A design example is presented
to illustrate the proposed algorithm and to demonstrate the performance of
C-filter relative to the conventional EPLSS and P-M filters.
References
[1] Y. Neuvo, C. Y. Dong, and S. K. Mitra, “Interpolated finite impulse
response filters,” IEEE Trans. Acoust., Speech, Signal Process.,
Vol. ASSP-32, pp. 563–570, Jun. 1984.
[2] T. Saramäki, Y. Neuvo, and S. K. Mitra, “Design of computationally
efficient interpolated FIR filters,” IEEE Trans. Circuits Syst., vol. 35,
pp. 70–88, Jan. 1988.
[3] Y. C. Lim, “Frequency-response masking approach for the synthesis of
sharp linear phase digital filters,” IEEE Trans. Circuits Syst., vol. 33,
pp. 357–364, Apr. 1986.
[4] W.-S. Lu and T. Hinamoto, “A unified approach to the design of interpo-
lated and frequency-response-masking FIR filters,” IEEE Trans. Circuits
Syst. I, vol. 63, no. 12, pp. 2257–2266, Dec. 2016.
[5] S. K. Mitra, Digital Signal Processing – A Computer-Based Approach,
3rd ed., McGraw Hill, 2006.
[6] D. Shiung, Y.-Y. Yang, and C.-S. Yang, “Improving FIR filters by using
cascade techniques,” IEEE Signal Processing Mag., vol. 33, pp. 108–114,
May 2016.
[7] D. Shiung, Y.-Y. Yang, and C.-S. Yang, “Cascading tricks for designing
composite filters with sharp transition bands,” IEEE Signal Processing
Mag., Vol. 33, No. 1, pp. 151–157 and 162, Jan. 2016.
References 251
[8] J. W. Adams, “FIR digital filters with least-squares stopbands subject

to peak-gain constraints,” IEEE Trans. Circuits Syst., vol. 39, no. 4,
pp. 376–388, Apr. 1991.
[9] SeDuMi1.3: http://sedumi.ie.lehigh.edu/
[10] CVX2.1: http://cvxr.com/cvx/
11
Finite Word Length Effects
11.1 Preview
Algorithms for linear filtering are realized as programs for general-purpose
digital computers or with special-purpose digital hardware in which sequence
values and coefficients are stored in a binary format with finite-length registers.
When a digital filter is designed with high accuracy and its coefficients are
quantized, the characteristics of the resulting digital filter differ inevitably from
the original design. For example, the coefficient quantization may alter a stable
filter to unstable one. In addition, when a sequence to be processed is obtained
by sampling a band-limited analog signal, the A/D converter produces only a
finite number of possible values for each sample.
In the implementations of a digital filter, numbers are stored in finite-
length registers. As a result, if sequence values and coefficients cannot be
accommodated in the available registers then they must be quantized before
being stored. There are three types of errors in number quantization:
(1) Coefficient-quantization errors
(2) Product-quantization errors
(3) Input-quantization errors
If coefficient-quantization is applied, then the frequency characteristic of the
resulting digital filter might differ inevitably from the desired one. Product-
quantization errors occur at the outputs of multipliers. For example, a b-bit
data sample multiplied by a b-bit coefficient results in a product that is 2b-
bit long. If the result of arithmetic operations is not quantized in a recursive
realization of a digital filter, the number of bits will increase indefinitely as
data processing continues. Since a uniform register length must be used in
practice throughout the filter, each multiplier must be rounded or truncated
before processing continues. The errors induced by rounding or truncation
propagate through the filter and rise output noise referred to output roundoff
253
254 Finite Word Length Effects
noise. Input-quantization errors occur in applications in which digital filter is

utilized to process continuous-time signals.
In this chapter, we review the fixed-point and floating-point arithmetic
of binary numbers and the two’s complement representation of negative
numbers. Limit cycles—overflow oscillations, scaling fixed-point digital
filters to prevent overflow, roundoff noise, and coefficient sensitivity will be
addressed. In addition, the response of a finite-word-length (FWL) state-space
description and methods for obtaining limit cycle-free realization will also be
examined.
11.2 Fixed-Point Arithmetic

The binary number generated by the A/D converter is assumed to be
a fixed point number. Among the three forms of number representation,
namely, signed magnitude, one’s complement, or two’s complement in fixed-
point arithmetic, two’s complement is most often used due to its easy
implementation. A two’s complement representation of a real number x is
given by
∞
x = Δ −b0 + bi 2−i , −Δ ≤ x ≤ Δ (11.1)
i=1
where bi for i ≥ 0 is unity or zero, the first bit b0 is the sign bit, namely, b0 = 1
if x < 0 and b0 = 0 if x > 0, and the value of Δ is arbitrary.
Using a finite-length register of L + 1 bits, the actual number stored is
quantized to Q[x] where
L

Q[x] = Δ −b0 + bi 2−i (11.2)
i=1
Fixed-point numbers are stored in registers, as illustrated in Figure 11.1. The

quantized representation of x, denoted by Q[x], must be an integral multiple
of the smallest quantum q with
q = Δ2−L (11.3)
Sign bit
L
Binary
point
Figure 11.1 Storage of fixed-point numbers.

11.2 Fixed-Point Arithmetic 255
which is the finest separation between the 2L+1 numbers we can represent with
L + 1 bits. The number q can be arbitrarily defined by choosing an appropriate
Δ, and is called quantization step size.
The error between the real number x and its finite binary representation is
given by
e = x − Q[x] (11.4)
It shall be assumed that in forming Q[x] the number x is rounded to the nearest
integer multiple of q. As a result, the quantizer that defines the relationship
between x and Q[x] has a characteristic as shown in Figure 11.2 where Δ is
normalized to be unity, i.e., Δ = 1.
From Figure 11.2, it is observed that the error e in (11.4) due to quantization
lies between −q/2 and q/2, that is,
q q
− ≤e≤ (11.5)
2 2
Theoretical studies and numerical experiments have shown [1, 2] that the
error e can be approximated as a random noise, uniformly distributed on
[−q/2, q/2] with probability density shown in Figure 11.3. Evidently, the
Q[x]
q
x
Rounding of
two’s complement
Figure 11.2 Quantizer characteristic for rounding of two’s complement numbers.
p(e)
1/q
e
−q/2 0 q/2
Figure 11.3 Probability density function of the quantization error.
mean value of the error e is zero, and the variance associated with this error
distribution is given by
q
2 1 2 q2
σe2 2
= E[e ] = e de = (11.6)
− 2q q 12
If a number x is larger than the largest number representable or smaller than

the smallest number representable, an overflow occurs. Overflows generally
create large errors and thus must be avoided.
In two’s complement representations, if nothing is done after an overflow,
the overflow characteristic is periodic and a small overflow causes an error
of approximately 2Δ, as shown in Figure 11.4 (a). Another way of handling
overflow is to use a saturation characteristic where the overflowed register is
reset to the largest or smallest number representable. However, it is not as easy
to implement as the two’s complement characteristic. The saturation overflow
characteristic is shown in Figure 11.4 (b).
Q[x]
... ...
x
(a) Two’s complement overflow
Q[x]
...
x
...
(b) Saturation overflow

Figure 11.4 Overflow characteristics. (a) Two’s complement overflow characteristic.
(b) Saturation overflow characteristic.
11.4 Limit Cycles—Overflow Oscillations 257
11.3 Floating-Point Arithmetic

Two basic disadvantages exist in a fixed-point arithmetic. First, the range of
numbers we can handle is small. For example, the smallest number is −1 and
the largest is 1−2−L provided Δ = 1 in the two’s complement representation.
Second, there is a tendency for the percentage error generated by truncation
or rounding to increase as the magnitude of the number is decreased. These
problems can be addressed to a large extent by employing a floating-point
arithmetic.
In the floating-point arithmetic, a number N is expressed as
N = M × 2I (11.7)
where 12 ≤ M < 1, and I is an integer. M and I are referred to as the mantissa

and exponent, respectively. Negative numbers are treated in the same manner
as in fixed-point arithmetic. Floating-point numbers are stored in registers, as
shown in Figure 11.5. The register is subdivided into two segments: one for
the signed mantissa and another for the signed exponent.
Floating-point arithmetic leads to increased dynamic range and improved
precision of processing. However, unlike fixed-point arithmetic, floating-point
arithmetic also leads to increased cost of hardware and reduced speed of
processing because both the mantissa and exponent must be manipulated in
hardware.
11.4 Limit Cycles—Overflow Oscillations

In the sequel two’s complement numbers are assumed to be used and the
associated overflow characteristic is assumed to be employed. In other words,
overflows are not explicitly detected and not corrected in some manner. With a
two’s complement overflow characteristic, a disastrous effect called the over-
flow oscillations can occur after an internal overflow. For discussion purposes,
we will disregard the roundoff error caused by the quantizer and focus only
on the overflow nonlinearity characteristic. Hence, two’s complement and
saturation overflow characteristics are depicted in Figure 11.6.
Signed mantissa Signed exponent
Figure 11.5 Storage of floating-point numbers.

fT(r)
...
r
...
(a) Two’s complement overflow
fS(r)
(b) Saturation overflow

Figure 11.6 Overflow characteristics. (a) Two’s complement overflow characteristic. (b)
Saturation overflow characteristic.
Assuming that the range of the quantizer is (−1, 1), an overflow function
f must satisfy
f (x) = x for |x| < 1, and |f (x)| ≤ 1 (11.8)
The above relations imply that the magnitude of f (x) never exceeds that of
x, namely,
|f (x)| ≤ |x| (11.9)
This is an essential property in characterizing an overflow function f .
In order to study overflow oscillations in a state-space model, we consider
an idealized state equation described by
x(k + 1) = Ax(k) (11.10)
In the case of finite-length registers, (11.10) is changed to the form

x(k + 1) = f Ax(k) (11.11)
11.4 Limit Cycles—Overflow Oscillations 259
with
f [x] = [f (x1 ), f (x2 ), · · · , f (xn )]T
where f is the overflow characteristic. If the filter in (11.10) is stable, x(k)
approaches 0 as k → ∞ in (11.10) for any initial state-variable vector x(0).
However, this may not occur for a system described by (11.11) which depends
on A and f .
We now define the norm of a vector x as
√ n 1
T 2 2
||x|| = x x = xi (11.12)
i=1
With this definition of vector norm, the norm of a matrix A can be defined as
||Ax|| xT AT Ax 1
2
||A|| = max = max (11.13)
x=0 ||x|| x=0 xT x
This norm of A is the maximum increase in the length of the vector. If ||A|| <
1, then all vectors x decrease in length under multiplication by A. In other
words, if we can obtain a structure with a system matrix A such that ||A|| < 1,
then zero-input overflow oscillations will not occur.
For example, consider the following second-order transfer function:
γ − jθ γ + jθ
H(z) = +
z + α − jβ z + α + jβ
2γz + 2(γα + θβ)
=
(z + α)2 + β 2 (11.14)
−1
−α β γ−θ
= 1 1 zI 2 −
−β −α γ+θ
This transfer function can be realized by a state-space model

x(k + 1) = Ax(k) + bu(k)
(11.15)
y(k) = cx(k)
where x(k) is a 2 × 1 state-variable vector, u(k) is a scalar input, y(k) is a
scalar output, and
−α β γ−θ
A= , b= , c= 1 1
−β −α γ+θ
Since the relation

AT A = AAT = (α2 + β 2 )I 2 (11.16)
holds, it follows that

xT I 2 x 12
||A|| = max (α2 + β 2 ) T = α2 + β 2 (11.17)
x=0 x x
On the other hand, from (11.14) we observe that the filter is stable if its poles
are all strictly inside the unit circle, i.e., α2 + β 2 < 1. This in conjunction
with (11.17) implies that the filter in (11.15) is stable if and only if ||A|| is
less than unity. In the literature, matrices satisfying AT A = AAT are known
as normal matrices, and state-space digital filters with normal system matrix
A are said to be normal digital filters. From (11.16), we see that (11.15) is a
normal digital filter. In general, it also holds that stable normal digital filters
are free of zero-input overflows.
11.5 Scaling Fixed-Point Digital Filters to Prevent Overflow

Internal overflows cause large errors and therefore must be prevented. This
can be achieved by appropriately scaling the realization. Scaling constraints
the numerical values of internal variables in the filter to remain in a range
appropriate for the hardware. The range of a filter variable is necessarily
limited due to the use of finite-length registers. For fixed-point number
representations, it can be expressed as a bound on internal variables v(k)
such that
|v(k)| ≤ Δ (11.18)
where Δ is related to the quantization step size q in (11.3) and usually assumed
to be unity.
A typical way of scaling is illustrated using Figures 11.7 and 11.8 where
U (z), Y (z), V (z), and V (z) are the z-transforms of input u(k), output y(k),
internal variables v(k) and v (k), respectively.
Figure 11.7 A system before scaling.

11.5 Scaling Fixed-Point Digital Filters to Prevent Overflow 261
Figure 11.8 A system after scaling where s is a scaling factor.
In these figures, the transfer function of the system described by

H(z) = G(z)F (z) + D(z) (11.19)
is unchanged before and after the scaling. However, the internal variable is
changed from v(k) to v (k). Suppose the impulse response of the transfer
function F (z) is denoted by f (i) for i = 0, 1, 2, · · · , then we can write
∞

v(k) = f (i)u(k − i) (11.20)
i=0
Four typical inputs will be considered in the sequel.

(1) If the input is sinusoidal: u(k) = cos(kω), then
|v(k)| ≤ max |F (ejω )| (11.21)
ω
(2) If the input is bounded: |u(k)| ≤ 1 for any k, then

∞
∞

|v(k)| ≤ |f (i)||u(k − i)| ≤ |f (i)| = ||f ||1 (11.22)
i=0 i=0
k 2
(3) If the input has finite energy: i=−∞ u(i) ≤ 1, then
∞

∞ 1
f (i)2
2
|v(k)| ≤ |f (i)u(k − i)| ≤ = ||f ||2 (11.23)
i=0 i=0
(4) If the input is a white Gaussian with zero mean and unit variance, then

∞ 1
2 2
E[v 2 (k)] = f (i) = ||f ||2 (11.24)
i=0
The first three cases stand for actual bounds on the range of variable v(k) for
each given input sequence. The last case is about a standard deviation of the
random variable. It is known that
||f ||2 ≤ max |F (ejω )| ≤ ||f ||1 (11.25)
ω
There are commonly used scaling rules to impose a certain constraint on the
variable v(k) in (11.18). For example, the l1 -scaling and l2 -scaling on the
impulse response for internal variables are given by
∞

||f ||1 = |f (i)| = 1 (11.26)
i=0
and

∞ 1
f (i)2
2
||f ||2 = δ =1 (11.27)
i=0
respectively, where the parameter δ is subjectively chosen. The scaling factor
s in Figure 11.8 is then chosen to meet the scaling rule. For example,
∞

s = ||f ||1 = |f (i)| for l1 -scaling
i=0
(11.28)

∞ 1
f (i)2
2
s = δ||f ||2 = δ for l2 -scaling
i=0
11.6 Roundoff Noise

When an IIR digital filter is implemented in hardware, typically three basic
operations, namely, multiplication by constants (the filter parameters), accu-
mulation of the products, and storage into memory, are involved. Since the
multiplications always increase the number of bits required to represent the
products, the results of accumulations inside the filter must be quantized
eventually. Suppose two B-bit numbers are multiplied together, the product is
2B-bits long. In the case where the quantization is performed by rounding, the
error caused by this quantization of internal accumulation is called roundoff
noise.
The model used to represent roundoff noise is illustrated in Figure 11.9
where the quantizer is replaced by an additive white noise source e of variance
q 2 /12 with the quantization step size q.
x Q Q[x] x + Q[x]=x+e
Figure 11.9 A linear equivalent model of internal quantization of product.
11.7 Coefficient Sensitivity 263
As shown in Figure 11.3, the roundoff noise e is assumed to be uniformly

distributed on [−q/2, q/2] with zero mean. This model also assumes that
noise from different accumulators is uncorrelated, and that each noise source
is uncorrelated with the input.
11.7 Coefficient Sensitivity

The quantization of filter parameters causes another effects due to the use of
finite-length registers. This is proved by a deterministic change in the input-
output characteristic of the filter. The effects of coefficient quantization can
be evaluated using the differentiation of the transfer function with respect to
the filter parameters. If serious changes in the input-output characteristic of
the filter are caused by the quantization of coefficients under a certain FWL,
coefficient word lengths might be lengthened accordingly.
Consider a transfer function H(z) that contains N parameters
{p1 , p2 , · · · , pN }. Let {p̃i } be the FWL version of {pi }, where p̃i = pi + Δpi
with Δpi the parameter perturbation, and let H̃(z) be the transfer function
associated with perturbed parameters {p̃i }. The first-order approximation of
H̃(z) then gives
H̃(z) = H(z) + ΔH(z) (11.29)
where ΔH(z) will be
N
∂H(z)
ΔH(z) = Δpi
∂pi
i=1
Evidently, smaller ∂H(z)/∂pi for i = 1, 2, · · · , N yields smaller transfer-

function error ΔH(z). For a fixed-point implementation with B bits, the
parameter perturbations are considered to be independently uniformly dis-
tributed random variables within the range [−2−B−1 , 2−B−1 ]. Under the
circumstances, a measure of the transfer function error can be statistically
defined as
1 dz
2
σΔH = E[|ΔH(z)|2 ] (11.30)
2πj |z|=1 z
where E[·] denotes the ensemble average operation. Since {Δpi } are
independent uniformly distributed random variables, it follows that
N

∂H(z) 2 2
E[|ΔH(z)| ] =2
∂pi σ (11.31)
i=1
where σ 2 = E[(Δpi )2 ] = 2−2B /12. Equation (11.31) establishes an analytic

relationship between variations in the transfer function induced by an FWL
realization and parameter sensitivity.
11.8 State-Space Descriptions with Finite Word Length

Consider a stable, controllable and observable nth-order state-space digital
filter (A, b, c, d)n described by
x(k + 1) = Ax(k) + bu(k)

(11.32)
where x(k) is an n × 1 state-variable vector, u(k) is a scalar input, y(k) is

a scalar output, and A, b, c and d are real constant matrices of appropriate
dimensions. A block-diagram of the state-space model in (11.32) is depicted
in Figure 11.10.
Taking the finite-precision nature of computer arithmetic into account, an
FWL implementation of (11.32) can be obtained as
x̃(k + 1) = [A + ΔA]x̃(k) + [b + Δb]u(k) + α(k)

(11.33)
ỹ(k) = [c + Δc]x̃(k) + [d + Δd]u(k) + β(k)
where x̃(k) is an actual state-variable vector, ỹ(k) is an actual output,

ΔA, Δb, Δc and Δd denote the quantization errors of coefficient matrices
A, b, c and d, respectively, and α(k) and β(k) are an n × 1 roundoff noise
vector and a roundoff noise caused by quantization after multiplications and
additions associated with (A, b) and (c, d), respectively. A block-diagram of
the actual state-space model in (11.33) is depicted in Figure 11.11.
x(k)
u(k) b z -1In c y(k)

11.8 State-Space Descriptions with Finite Word Length 265
d d
x(k)
u(k) b b z -1In c c y(k)
(k)
A A
Figure 11.11 An actual state-space model.
Subtracting (11.32) from (11.33) yields
Δx(k + 1) = A Δx(k) + ΔA x̃(k) + Δb u(k) + α(k)

(11.34)
Δy(k) = c Δx(k) + Δc x̃(k) + Δd u(k) + β(k)
where
Δx(k) = x̃(k) − x(k), Δy(k) = ỹ(k) − y(k)
Assuming that ΔAΔx(k) 0 and ΔcΔx(k) 0, (11.34) can be
approximated as
Δx(k + 1) = A Δx(k) + ΔA x(k) + Δb u(k) + α(k)

(11.35)
Δy(k) = c Δx(k) + Δc x(k) + Δd u(k) + β(k)
which leads to
Δy(k) = Δyr (k) + Δyc (k) (11.36)
provided that initial state-variable vector Δx(0) is set to null, i.e., Δx(0) = 0
where
k−1

Δyr (k) = cAk−i−1 α(i) + β(k)
i=0
k−1

Δyc (k) = cAk−i−1 ΔA x(i) + Δb u(i) + Δc x(k) + Δd u(k)
i=0
Equation (11.36) shows that the filter’s output error Δy(k) is represented by
the sum of the roundoff error Δyr (k) and error Δyc (k) caused by coefficient
quantization.
A different yet equivalent state-space description of (11.32), (A, b, c, d)n ,

can be obtained via a coordinate transformation
x(k) = T −1 x(k) (11.37)
where
A = T −1 AT , b = T −1 b, c = cT
Accordingly, the roundoff error Δyr (k) and error Δyc (k) caused by coefficient
quantization in (11.36) are transformed into
k−1

Δy r (k) = cAk−i−1 T α(i) + β(k)
i=0
k−1
(11.38)
k−i−1
−1

Δy c (k) = cA T ΔA T x(i) + Δb u(i)
i=0
+ Δc T −1 x(k) + Δd u(k)
respectively, which are a function of T , respectively. This reveals that the
roundoff error Δy r (k) and error Δy c (k) caused by coefficient quantization
depend on the internal structure of the state-space model.
11.9 Limit Cycle-Free Realization

This section studies conditions for a state-space realization to be free of limit
cycles. We begin by setting ΔA = 0, Δb = 0 and u(k) = 0 in the state
equation of (11.33) which leads to
x̃(k + 1) = Ax̃(k) + α(k) (11.39)
If we perform quantization after multiplications and additions, (11.39) needs
to be modified to a nonlinear equation of the form
x̃(k + 1) = f [Ax̃(k)] (11.40)
where f [·] is a nonlinear function satisfying
|fi [xi ]| ≤ di |xi | for i = 1, 2, · · · , n (11.41)
with di > 0 and represents quantization after multiplications and additions,
or adder’s overflow. The relationship between xi and fi [xi ] satisfying (11.41)
is drawn in Figure 11.12.
11.9 Limit Cycle-Free Realization 267
fi [xi]
-di xi di xi
xi
0
Figure 11.12 A nonlinear section satisfying (11.41).
It is noted that
⎧
⎨ 1 for quantization by truncation
⎪
di = 1 for adder’s overflow (11.42)
⎪
⎩ 2 for quantization by rounding
If the nonlinear system in (11.40) is asymptotically stable, then the state-

variable vector x̃(k) converges to 0 as k goes to infinity, hence limit cycles
will not occur.
We now define a Lyapunov function as
V [x̃(k)] = x̃(k)T P x̃(k) (11.43)
where
P = diag{p1 , p2 , · · · , pn } > 0
and compute the difference
ΔV [x̃(k)] = V [x̃(k + 1)] − V [x̃(k)]
= x̃(k + 1)T P x̃(k + 1) − x̃(k)T P x̃(k)

(11.44)
= −x̃(k)T [P − (DA)T P DA]x̃(k)
+ f [Ax̃(k)]T P f [Ax̃(k)] − [Ax̃(k)]T D T P DAx̃(k)
where
D = diag{d1 , d2 , · · · , dn }
f [Ax̃(k)]T P f [Ax̃(k)] − [Ax̃(k)]T D T P DAx̃(k)

n
(11.45)
= pi {|fi [eTi Ax̃(k)]|2 − d2i |eTi Ax̃(k)|2 } ≤ 0
i=1
where ei denotes the ith column of the identity matrix I n of dimension n × n.

Hence if there exists a positive-definite diagonal matrix P such that
P − (DA)T P DA > 0 (11.46)
then for arbitrary state-variable vector x̃(k), we obtain
ΔV [x̃(k)] ≤ 0 (11.47)
where equality holds if and only if x̃(k) = 0. Consequently, the existence of

a positive-definite diagonal matrix P satisfying (11.46) will ensure that the
nonlinear system in (11.40) is asymptotically stable, and limit cycles will not
occur for zero input u(k) = 0.
From (11.46), it is observed that in the case of quantization by truncation,
or adder’s overflow, the condition in (11.46) is changed to
P − AT P A > 0 (11.48)
and in the case of quantization by rounding, the condition in (11.46) becomes
P − 4AT P A > 0 (11.49)
For second-order filters, the condition in (11.48) reduces to a very simple

condition [10] that involves the four elements of the 2 × 2 matrix
a11 a12
A= (11.50)
a21 a22
The condition is stated in the following theorem.
Theorem 11.1
Suppose that the magnitudes of the eigenvalues of the 2 × 2 matrix A in
(11.50) are strictly less than unity. There exists a positive-definite diagonal
matrix P for which P − AT P A is positive definite if and only if one of the
following two sets of conditions holds:
11.9 Limit Cycle-Free Realization 269
(a) a12 a21 ≥ 0 (11.51)

(b) a12 a21 < 0 and |a11 − a22 | + det[A] < 1 (11.52)
Proof
The characteristic polynomial of matrix A is given by
det(zI 2 − A) = z 2 + a1 z + a2 (11.53)
where
a1 = −tr[A] = −(a11 + a22 )
a2 = det[A] = a11 a22 − a12 a21
Recall the well-known conditions called stability triangle as described in
Section 3.2.6 that the roots of the characteristic equation z 2 + a1 z + a2 = 0
satisfy |λ| < 1 if and only if the following conditions hold:
1 − a2 > 0 (11.54a)
1 + a1 + a2 > 0 (11.54b)
1 − a1 + a2 > 0 (11.54c)
It can readily be verified that there exists a positive-definite diagonal matrix
P for which P − AT P A is positive definite if and only if there exists a
nonsingular diagonal matrix T for which I 2 − M is positive definite, where
M = (T −1 AT )T T −1 AT (11.55)
with P = T −T T −1 . The matrix I 2 − M is positive definite if and only if
both its eigenvalues are positive, and this is true if and only if
tr[I 2 − M ] > 0 and det[I 2 − M ] > 0 (11.56)
because these are the sum and product of the eigenvalues. Hence we consider
det[I 2 − M ] = det[zI 2 − M ]|z=1
= 1 − tr[M ] + det[M ]
= 1 − tr[M ] + (det[A])2
(11.57)
tr[I 2 − M ] = 2 − tr[M ]
> 1 − tr[M ] + (det[A])2
= det[I 2 − M ]
which can be obtained from the fact that (det[A])2 < 1 for stable filters. From
(11.57), we need consider only the inequality det[I 2 − M ] > 0, since it will
then follow that tr[I 2 − M ] > 0. In terms of the elements of T , we can derive
from (11.57) that
det[I 2 − M ] = 1 − (a211 + a212 τ 2 + a221 τ −2 + a222 ) + (det[A])2 (11.58)
where T = diag{t1 , t2 } and τ = t2 /t1 . Applying the arithmetic-geometric
mean inequality
a212 τ 2 + a221 τ −2
≥ |a12 a21 | (11.59)
2
to (11.58) results in
det[I 2 − M ] ≤ 1 − (a211 + 2 |a12 a21 | + a222 ) + (det[A])2
(11.60)
= (1 + det[A])2 − (tr[A])2 − 2(a12 a21 − |a12 a21 |)
where equality holds if and only if τ 2 = |a21 /a12 |. If a12 a21 ≥ 0, then the
right side of (11.60) is the product of the left sides of (11.54b) and (11.54c),
and is therefore positive. If a12 a21 < 0, then the right side of (11.60) becomes
1−(a211 −2a12 a21 +a222 )+(det[A])2 = (1−det[A])2 −(a11 −a22 )2 (11.61)
which is positive if and only if (11.52) holds. This completes the proof of
Theorem 11.1.
11.10 Summary
In this chapter, we have reviewed the fixed-point and floating-point arithmetic
of binary numbers and the two’s complement representation of negative num-
bers. Limit cycles—overflow oscillations, scaling fixed-point digital filters to
prevent overflow, roundoff noise, and coefficient sensitivity have also been
addressed. Finally, the response of a FWL state-space description and methods
for obtaining limit cycle-free realization have been explored. The material
studied in this chapter provides a basis for the techniques to be presented in
the remaining chapters.
References
[1] W. R. Bennet, “Spectra of quantized signals,” Bell Syst. Tech. J., vol. 27,
pp. 446–472, July 1948.
References 271
[2] B. Widrow, “A study of rough amplitude quantization by means

of Nyquist sampling theory,” IRE Trans. Circuit Theory, vol. CT-3,
pp. 266–276, Dec. 1956.
Shokodo Co., 1986.
Wesley, 1987.
[6] A. Antoniou, Digital Filters, 2nd ed., NJ: McGraw-Hill, 1993.
[7] T. Hinamoto, S. Yokoyama, T. Inoue, W. Zeng and W.-S. Lu, “Anal-
ysis and minimization of l2 -sensitivity for linear systems and two-
dimensional state-space filters using General controllability and observ-
ability Grammians,” IEEE Trans. Circuits Syst. I, vol. 49, no. 9,
pp. 1279–1289, Sept. 2002.
[8] S. K. Mitra, Digital Signal Processing, 3rd ed., NJ: McGraw-Hill, 2006.
[9] C. W. Barnes and A. T. Fam, “Minimum norm recursive digital filters
that are free of overflow limit cycles” IEEE Trans. Circuits Syst., vol.
CAS-24, no. 10, pp. 569–574, Oct. 1977.
[10] W. L. Mills, C. T. Mullis and R. A. Roberts, “Digital filter realizations
without overflow oscillations” IEEE Trans. Acoust., Speech, Signal
Process., vol. ASSP-26, no. 4, pp. 334–338, Aug. 1978.
12
l2-Sensitivity Analysis and Minimization
12.1 Preview
It is of practical significance in many applications to construct a filter structure
so that the coefficient sensitivity of the digital filter is minimum or nearly
minimum in a certain sense. Due to finite-word-length (FWL) effects caused
by coefficient truncation or rounding, poor sensitivity may lead to degradation
of the transfer characteristics in an FWL implementation of the digital filter.
For instance, the characteristics of an originally stable filter might be so
altered that the filter becomes unstable. This motivates the study of the
coefficient sensitivity minimization problem for digital filters. Techniques
for synthesizing the state-space filter structures that minimize the deviation
of a transfer function caused by coefficient quantization can be divided
into two main classed, namely l1 /l2 -mixed sensitivity minimization [1–5]
and l2 -sensitivity minimization [6–9]. In [6–9], it has been argued that the
sensitivity measure based on the l2 -norm only is more natural and reasonable
relative to the l1 /l2 -mixed sensitivity minimization. More recently, the pro-
blem of minimizing l2 -sensitivity subject to l2 -scaling constraints has
been examined for state-space digital filters [10, 11]. It is known that
the use of scaling constraints can be beneficial for suppressing overflow
oscillations [12, 13].
In this chapter, the l2 -sensitivities with respect to the coefficient matrix
for a state-space digital filter is analyzed, and an l2 -sensitivity measure is
derived. Next, a simple method for minimizing the l2 -sensitivity measure is
presented by using a recursive matrix equation. In addition, two techniques
for minimizing the l2 -sensitivity measure subject to l2 -scaling constraints are
described: one employs a quasi-Newton algorithm and the other relies on a
Lagrange function.
Numerical experiments are presented to illustrate the validity and effec-
tiveness of these algorithms and demonstrate their performance.
273
274 l2 -Sensitivity Analysis and Minimization
12.2 l2 -Sensitivity Analysis

Consider a stable, controllable and observable state-space digital filter
(A, b, c, d)n described by
x(k + 1) = Ax(k) + bu(k)
(12.1)
where x(k) is an n × 1 state-variable vector, u(k) is a scalar input, y(k) is a
scalar output, and A, b, c and d are n × n, n × 1, 1 × n and 1 × 1 real constant
matrices, respectively, and these matrices are given by
⎡ ⎤ ⎡ ⎤
a11 a12 · · · a1n b1
⎢ a21 a22 · · · a2n ⎥ ⎢ b2 ⎥
⎢ ⎥ ⎢ ⎥
A = ⎢ .. .. . . ⎥ , b = ⎢ .. ⎥
⎣ . . . . . ⎦
. ⎣ . ⎦
an1 an2 · · · ann bn

c = c1 c2 · · · cn
The transfer function of the filter in (12.1) can be expressed as
H(z) = c(zI n − A)−1 b + d (12.2)
The l2 -sensitivities of the transfer function with respect to coefficient matrices
A, b, c and d are computed as follows.
Definition 12.1
Let X and f (X) be an m × n real matrix and a scalar complex function of
X differentiable with respect to all entries of X, respectively. The sensitivity
function of f (X) with respect to X is then defined as [5]
∂f (X) ∂f (X)
SX = , (S X )ij = (12.3)
∂X ∂xij
where both xij and (X)ij denote the (i, j)th entry of matrix X, respectively.
By virtue of (12.2), Definition 12.1 and the formula
∂A−1 ∂A −1
= −A−1 A (12.4)
∂aij ∂aij
the sensitivities of H(z) with respect to elements aij , bi , cj and d are
evaluated by
12.2 l2 -Sensitivity Analysis 275
∂H(z) ∂H(z)
= gi (z)fj (z), = gi (z)
∂aij ∂bi
(12.5)
∂H(z) ∂H(z)
= fj (z), =1
∂cj ∂d
respectively, where
⎡ ⎤
f1 (z)
⎢ f2 (z) ⎥
⎢ ⎥
f (z) = (zI n − A)−1 b = ⎢ .. ⎥
⎣ . ⎦
fn (z)
g(z) = c(zI n − A)−1 = [g1 (z), g2 (z), · · · , gn (z)]

Equation (12.5) is equivalent to
∂H(z) ∂H(z)
= [f (z)g(z)]T , = g T (z)
∂A ∂b
(12.6)
∂H(z) ∂H(z)
= f (z), =1
∂cT ∂d
Definition 12.2
Let X(z) be an m × n complex-valued matrix function of complex variable
z, and xpq (z) be the (p, q)th entry of X(z). The l2 -norm of X(z) is
defined as
1
2π m n 2
1 2
X(z)2 = xpq (ejω ) dω
2π 0
p=1 q=1
(12.7)
1
2
1 dz
= tr X(z)X H (z)
2πj |z|=1 z
Using (12.6) and (12.7), the overall l2 -sensitivity measure for the transfer
function in (12.2) is defined by

∂H(z) 2 ∂H(z) 2 ∂H(z) 2
So = + +
∂A 2 ∂b 2 ∂cT 2 (12.8)
2 2
= [f (z)g(z)]T 2 + g T (z)2 + f (z)2
2
The term d in (12.2) and the sensitivity with respect to it are coordinate
independent, and therefore they are neglected here.
It is easy to show that the l2 -sensitivity measure in (12.8) can be
expressed as
So = tr[N (I n )] + tr[W o ] + tr[K c ] (12.9)
where

1 dz
Kc = f (z)f T (z −1 )
2πj |z|=1 z

1 dz
Wo = g T (z)g(z −1 )
2πj |z|=1 z

1 dz
N (P ) = [f (z)g(z)]T P −1f (z −1 )g(z −1 )
2πj |z|=1 z
Here P is an n × n nonsingular matrix that will later be related to a
coordinate transformation for the state-variable vector of the digital filter
(see Section 12.3 below). Noting that
(zI n − A)−1 = z −1 I n + Az −2 + A2 z −3 + · · ·
∞ (12.10)
= Ai z −(i+1)
i=0
and utilizing Cauchy’s integral theorem

1 k dz
1, k=0
z = (12.11)
2πj C z 0, k = 0
where C is a counterclockwise contour that encircles the origin, matrices K c
and W o in (12.9) can be written as
∞
Kc = Ak bbT (Ak )T
k=0
(12.12)
∞
k T T k
Wo = (A ) c cA
k=0
respectively, and they can be obtained by solving the Lyapunov equations
K c = AK c AT + bbT
(12.13)
W o = A T W o A + cT c
12.3 Realization with Minimal l2 -Sensitivity 277
respectively. The matrices K c and W o in (12.12) are called the controlla-

bility Grammian and observability Grammian, respectively. Similarly, matrix
N (P ) in (12.9) can be derived from
∞
N (P ) = H T (i)P −1 H(i) (12.14)
i=0
where
i
H(i) = Al bcAi−l
l=0
It is noted that N (P ) can also be obtained by solving the Lyapunov equation
in closed form, as shown in (12.21) and (12.22) later.
12.3 Realization with Minimal l2 -Sensitivity

If a coordinate transformation defined by
x(k) = T −1 x(k) (12.15)
is applied to the filter in (12.1), we obtain a new realization (A, b, c, d)n
characterized by
x(k + 1) = Ax(k) + bu(k)
(12.16)
y(k) = c x(k) + du(k)
where
A = T −1 AT , b = T −1 b, c = cT
Accordingly, the controllability and observability Grammians relating to
(A, b, c, d)n can be expressed as
K c = T −1 K c T −T , W o = T T W oT (12.17)
respectively. From (12.2) and (12.16), it is obvious that the transfer function
H(z) is invariant under the coordinate transformation (12.15). In addi-
tion, under the coordinate transformation (12.15), matrix N (I n ) becomes
T T N (P )T and the l2 -sensitivity measure in (12.9) is changed to
So (T ) = tr[T T N (T T T )T ] + tr[T T W o T ] + tr[T −1 K c T −T ] (12.18)
S(P ) = tr[N (P )P ] + tr[W o P ] + tr[K c P −1 ] (12.19)
where P = T T T . By noting that
f (z)g(z) = T −1 f (z)g(z)T
−1
−1 zI n − A −bc 0 (12.20)
= T 0
0 zI n − A T
where
f (z) = (zI n − A)−1 b, g(z) = c(zI n − A)−1
and denoting the observability Grammian of a composite system f (z)g(z)
in (12.20) by Y , it is easy to show that for an arbitrary P = T T T , matrix
N (P ) can be obtained by solving the Lyapunov equation
T
A bc A bc P −1 0
Y = Y + (12.21)
0 A 0 A 0 0
and then taking the lower-right n × n block of Y as N (P ), namely,

0
N (P ) = 0 I n Y (12.22)
In
It is well known that the solution of minimizing S(P ) in (12.19) with respect to
P must satisfy the Karush-Kuhn-Tucker (KKT) condition ∂S(P )/∂P = 0.
Using the formula for evaluating the matrix gradient [5, 14]
∂ tr[M X]
= MT
∂X
(12.23)
∂ tr[M X −1 ] T
= − X −1 M X −1
∂X
the gradient of S(P ) in (12.19) with respect to P is found to be
∂S(P )
= N (P ) − P −1 M (P )P −1 + W o − P −1 K c P −1 (12.24)
∂P
where

1 dz
M (P ) = F (z −1 )G(z −1 )P [F (z)G(z)]T
2πj |z|=1 z
We remark that tr[N (P )P ] = tr[M (P )P −1 ]. If the controllability

Grammian of a composite system f (z)g(z) in (12.20) is denoted by X,
12.3 Realization with Minimal l2 -Sensitivity 279
it is easily shown that for an arbitrary P = T T T , matrix M (P ) can be

obtained by solving the Lyapunov equation
T
A bc A bc 0 0
X= X + (12.25)
0 A 0 A 0 P
and then taking the upper-left n × n block of X as M (P ), namely,

In
M (P ) = I n 0 X (12.26)
0
Therefore, the KKT condition becomes
P F (P )P = G(P ) (12.27)
where
F (P ) = N (P ) + W o , G(P ) = M (P ) + K c
Equation (12.27) is highly nonlinear with respect to P . An effective approach

for solving (12.27) is to relax it into the recursive second-order matrix equation
P k+1 F (P k )P k+1 = G(P k ) (12.28)
where P k is assumed to be known from the previous recursion. Note that if

the matrix sequence {P k } converges to its limit matrix, say P , then (12.28)
converges to (12.27) as k goes infinity.
Noting that for a positive definite W and a semi-positive definite M ,
matrix equation P W P = M has the unique solution [5]
1 1 1 1 1
P = W − 2 [W 2 M W 2 ] 2 W − 2 (12.29)
the solution P k+1 of (12.28) is given by

1 1 1 1 1
P k+1 = F (P k )− 2 [F (P k ) 2 G(P k )F (P k ) 2 ] 2 F (P k )− 2 (12.30)
where P k is the solution of the previous iteration, and the initial estimate P 0
is often chosen as P 0 = I n . This iteration process continues until
|S(P k+1 ) − S(P k )| < ε (12.31)
is satisfied where ε > 0 is a prescribed tolerance. If the iteration is terminated

at step k, P k is claimed to be a solution point.
We now obtain the optimal coordinate transformation matrix T that solves the
problem of minimizing S(P ) in (12.19). As analyzed earlier, the optimal T
assumes the form 1
T = P 2U (12.32)
where P 1/2 is square root of the matrix P obtained above, and U is an arbitrary
n × n orthogonal matrix. The optimal realization with minimal l2 -sensitivity
can readily be constructed by substituting (12.32) into (12.16).
12.4 l2 -Sensitivity Minimization Subject to l2 -Scaling

Constraints Using Quasi-Newton Algorithm
12.4.1 l2 -Scaling and Problem Formulation
Let X(z) and U (z) be the z-transforms of the state-variable vector x(k) and
the input u(k) in (12.1), respectively. Then the relation of the state variables
to the input in the frequency domain can be expressed as
X(z) = f (z)U (z) (12.33)
where f (z) is a transfer function from the input u(k) to the state-variable
vector x(k), and described by
∞
f (z) = (zI n − A)−1 b = Ak−1 bz −k
k=1
whose impulse response is seen as the sequence {b, Ab, · · · , Ak−1 b, · · · }. If

the input has finite energy, that is,
k−1
u2 (l) ≤ 1 (12.34)
l=−∞
then we have
∞
|xi (k)| ≤ |eTi Al−1 bu(k − l)|
l=1
∞ 1 (12.35)
eTi Al−1 bbT (AT )l−1 ei
2
≤
l=1
= eTi K c ei
12.4 l2 -Sensitivity Minimization Subject to l2 -Scaling Constraints 281
where xi (k) denotes the ith element of the state-variable vector x(k), ei is
the ith column of an identity matrix I n of dimension n × n, and matrix
∞
Kc = Ak bbT (Ak )T
k=0
is the controllability Grammian of the filter in (12.1) that can be obtained by

solving the Lyapunov equation in (12.13). In the above, the l2 -scaling rule is
given by
δ 2 eTi K c ei = 1 for i = 1, 2, · · · , n (12.36)
where there is no loss of generality in assuming that scalar δ can be chosen
as δ = 1.
We are now in a position to apply the l2 -scaling rule to the new realization
in (12.16), i.e.,
eTi K c ei = eTi T −1 K c T −T ei = 1 for i = 1, 2, · · · , n (12.37)
As a result, the problem of l2 -sensitivity minimization subject to l2 -scaling
constraints is now formulated as follows: Given the matrices A, b, and c,
obtain an n × n nonsingular matrix T which minimizes the l2 -sensitivity
measure So (T ) in (12.18) subject to the l2 -scaling constraints in (12.37).

Constraints — Using Quasi-Newton Algorithm
When the state-space model in (12.1) is assumed to be stable and controllable,
the controllability Grammian K c in (12.12) is symmetric and positive-definite.
1 1 1
This implies that K c2 satisfying K c = K c2 K c2 is also symmetric and positive-
definite.
By defining
− 12
T̂ = T T K c (12.38)
the l2 -scaling constraints in (12.37) can be written as
−T −1
eTi T̂ T̂ ei = 1 for i = 1, 2, · · · , n (12.39)
−1
The constraints in (12.39) simply state that each column in matrix T̂ must
−1
be a unit vector. If matrix T̂ is assumed to have the form

−1 t1 t2 tn
T̂ = , ,··· , (12.40)
||t1 || ||t2 || ||tn ||
so that (12.39) is always satisfied. From (12.18) and (12.38), it follows that
T T
J(T̂ ) = tr[T̂ N̂ (T̂ )T̂ ] + tr[T̂ Ŵ o T̂ ] + n
(12.41)
−T −1 T
= tr[T̂ M̂ (T̂ )T̂ ] + tr[T̂ Ŵ o T̂ ] + n
where
1 1 T 1 1 1 1
N̂ (T̂ ) = K c2 N (K c2 T̂ T̂ K c2 )K c2 , Ŵ o = K c2 W o K c2
−1 1 T 1
− 12
M̂ (T̂ ) = K c 2 M (K c2 T̂ T̂ K c2 )K c
From the foregoing arguments, the problem of obtaining an n × n nonsin-

gular matrix T which minimizes So (T ) in (12.18) subject to the l2 -scaling
constraints in (12.37) can be converted into an unconstrained optimization
problem of obtaining an n × n nonsingular matrix T̂ which minimizes J(T̂ )
in (12.41).
We now apply a quasi-Newton algorithm [15] to minimize (12.41) with
respect to matrix T̂ in (12.40). Let x be the column vector that collects the
independent variables in matrix T̂ , i.e.,
x = (tT1 , tT2 , · · · , tTn )T (12.42)
Then, J(T̂ ) is a function of x and is denoted by J(x). The algorithm starts

with a trivial initial point x0 obtained from an initial assignment T̂ = I n .
Then, in the kth iteration, a quasi-Newton algorithm updates the most recent
point xk to point xk+1 as
xk+1 = xk + αk dk (12.43)
where

dk = −S k ∇J(xk ), αk = arg min J(xk + αdk )
α
T
γ Tk S k γ k δ k δ Tk − δ k γ Tk S k +S k γ k δ k
S k+1 = S k + 1 +
γ Tk δ k γ k δk
T γ k δk
T
S 0 = I, δ k = xk+1 − xk , γ k = ∇J(xk+1 ) − ∇J(xk )
In the above, ∇J(x) is the gradient of J(x) with respect to x, and S k is a

positive-definite approximation of the inverse Hessian matrix of J(x).
|J(xk+1 ) − J(xk )| < ε (12.44)

at step k, the xk is viewed as a solution point.
12.4.3 Gradient of J(x)

The implementation of (12.43) requires the gradient of J(x), which can be
efficiently evaluated using the closed-form expressions derived below.
Each term of the objective function in (12.41) has the form J(x) =
T −T −1
tr[T̂ N T̂ ] (or J(x) = tr[T̂ M T̂ ]) which, in the light of (12.40), can
be expressed as
−1
t1 t2 tn t1 t2 tn −T
J(x) = tr , ,··· ,N , ,··· ,
||t1 || ||t2 || ||tn ||
||t1 || ||t2 || ||tn ||
(12.45)
In order to compute ∂J(x)/∂tij , we perturb the ith component of vector tj
by a small amount, say Δ, and keep the rest of T̂ unchanged. If we denote
−1
the perturbed j the column of T̂ by t̃j /||t̃j ||, then a linear approximation
of t̃j /||t̃j || can be obtained as
t̃j tj t tj
j
+ Δ∂ /∂tij = − Δ g ij (12.46)
||t̃j || ||tj || ||tj || ||tj ||
where t
j 1
g ij = −∂ /∂tij = (tij tj − ||tj ||2 ei )
||tj || ||tj ||3
Now let T̂ ij be the matrix obtained from T̂ with a perturbed (i, j)th
component, then we obtain
−1 −1
T̂ ij = T̂ − Δg ij eTj (12.47)
and up to the first-order, the matrix inversion formula [16, p. 655] gives
ΔT̂ g ij eTj T̂
T̂ ij = T̂ + T̂ + ΔT̂ g ij eTj T̂ (12.48)
1 − ΔeTj T̂ g ij
For convenience, we define T̂ ij = T̂ + ΔS and write

T T
T̂ ij N T̂ ij = T̂ + ΔS N T̂ + ΔS
(12.49)
T T T
= T̂ N T̂ + ΔSN T̂ + ΔT̂ N S + Δ2 SN S T
which implies that

T T T
tr T̂ ij N T̂ ij − tr T̂ N T̂ Δ tr S N + N T T̂ (12.50)
provided that Δ is sufficiently small. Hence

T T T
∂ tr T̂ N T̂ tr T̂ ij N T̂ ij − tr T̂ N T̂
= lim
∂tij Δ→0 Δ (12.51)
T
= tr S N + N T T̂
−1 −1
Similarly, if we define T̂ ij = T̂ − ΔS, then we arrive at
−T −1 −T −1 −T −1
∂tr T̂ M T̂ tr T̂ ij M T̂ ij − tr T̂ M T̂
= lim
∂tij Δ→0 Δ (12.52)
−1
= −tr S T M + M T T̂
Referring to (12.51), it follows from (12.41) and (12.48) that

T
∂tr T̂ N̂ (T̂ )T̂ T
= 2 tr T̂ g ij eTj T̂ N̂ (T̂ )T̂
∂tij (12.53)
T
= 2 eTj T̂ N̂ (T̂ )T̂ T̂ g ij
and T
∂tr T̂ Ŵ o T̂ T
= 2 tr T̂ g ij eTj T̂ Ŵ o T̂
∂tij (12.54)
T
= 2 eTj T̂ Ŵ o T̂ T̂ g ij
Referring to (12.52), it follows from (12.41) and (12.47) that

−T −1
∂tr T̂ M̂ (T̂ )T̂ −1
= −2 tr ej g Tij M̂ (T̂ )T̂
∂tij
−1 (12.55)
= −2 g Tij M̂ (T̂ )T̂ ej
−T
= −2 eTj T̂ M̂ (T̂ )g ij
As a result, the gradient of J(x) now can be evaluated in closed-form as
∂J(T̂ ) J(T̂ ij ) − J(T̂ )

= lim = 2 β1 − β2 + β3 (12.56)
∂tij Δ→0 Δ
where
T −T
β1 = eTj T̂ N̂ (T̂ )T̂ T̂ g ij , β2 = eTj T̂ M̂ (T̂ )g ij
T
β3 = eTj T̂ Ŵ o T̂ T̂ g ij
12.5 l2 -Sensitivity Minimization Subject to l2 -Scaling

Constraints Using Lagrange Function
Constraints — Using Lagrange Function
The problem of minimizing S(P ) in (12.19) subject to l2 -scaling constraints
in (12.37) is a constrained nonlinear optimization problem where matrix P is
the variable. If we sum the n constraints in (12.37) up, then we have
tr[T −1 K c T −T ] = tr[K c P −1 ] = n (12.57)
Consequently, the problem of minimizing (12.19) subject to the constraints in

(12.37) can be relaxed into the following problem:
minimize S(P ) in (12.19) with respect to P

(12.58)
subject to tr[K c P −1 ] = n
Although clearly a solution of problem (12.58) is not necessarily a solution of

the problem of minimizing (12.19) subject to l2 -scaling constraints in (12.37),
it is important to stress that the ultimate solution we seek for is not matrix P
but a nonsingular matrix T that is related to the solution of the problem of
minimizing (12.19) subject to l2 -scaling constraints in (12.37) as P = T T T .

If matrix P is a solution of problem (12.58) and P 1/2 denotes a matrix square
root of P , i.e., P = P 1/2 P 1/2 , then it is easy to see that any matrix T
of the form T = P 1/2 U where U is an arbitrary orthogonal matrix still
holds the relation P = T T T . As will be shown shortly, under the constraint
tr[K c P −1 ] = n in (12.58) there exists an orthogonal matrix U such that
matrix T = P 1/2 U satisfies l2 -scaling constraints in (12.37), where P 1/2 is
a square root of the solution matrix P for problem (12.58).
For these reasons, we now address problem (12.58) as the first step of our
solution procedure. To solve (12.58), we define the Lagrange function of the
problem as
J(P , λ) = tr[N (P )P ] + tr[W o P ] + tr[K c P −1 ]

(12.59)
+ λ(tr[K c P −1 ] − n)
where λ is a Lagrange multiplier. It is well known that the solution of

problem (12.58) must satisfy the Karush-Kuhn-Tucker (KKT) conditions
∂J(P , λ)/∂P = 0 and ∂J(P , λ)/∂λ = 0. From (12.23), the gradients are
found to be
∂J(P , λ)
= N (P ) − P −1 M (P )P −1
∂P
+ W o − (λ + 1)P −1 K c P −1 (12.60)
∂J(P , λ)
= tr[K c P −1 ] − n
∂λ
where M (P ) can be obtained by solving (12.25) and (12.26). Hence the KKT
conditions become
P F (P )P = G(P , λ), tr[K c P −1 ] = n (12.61)
where
F (P ) = N (P ) + W o
G(P , λ) = M (P ) + (λ + 1)K c
The first equation in (12.61) is highly nonlinear with respect to P . An effective
approach for solving the first equation in (12.61) is to relax it into the recursive
second-order matrix equation
P k+1 F (P k )P k+1 = G(P k , λk+1 ) (12.62)

where P k is assumed to be known from the previous recursion. Recalling the

solution given by (12.29), the solution P k+1 of (12.62) is given by
1 1 1 1 1
P k+1 = F (P k )− 2 [F (P k ) 2 G(P k , λk+1 )F (P k ) 2 ] 2 F (P k )− 2 (12.63)
To derive a recursive formula for the Lagrange multiplier λ, we employ (12.61)
to write
tr[P F (P )] = tr[M (P )P −1 ] + n(λ + 1) (12.64)
which naturally suggests the recursion for λ
tr[P k F (P k )] − tr[M (P k )P −1
k ]
λk+1 = −1 (12.65)
n
The iteration process starts with an initial estimate P 0 , and continues until
S(P k+1 ) − S(P k ) + n − tr[K c P −1
k+1 ] < ε (12.66)
is satisfied for a prescribed tolerance ε > 0. If the iteration is terminated at
step k, P k is claimed to be a solution point.

l2 -Scaling Constraints
As the second step of the solution procedure, having obtained an optimal
P , we now turn our attention to the construction of the optimal coordinate
transformation matrix T that solves the problem of minimizing (12.19) subject
to l2 -scaling constraints in (12.37). As is given in (12.32), the optimal T
assumes the form 1
T = P 2U (12.67)
where U is an n × n orthogonal matrix that can be determined to satisfy the
l2 -scaling constraints as follows. From (12.17) and (12.67), it follows that
1 1
K c = U T P − 2 K cP − 2 U (12.68)
To find an n × n orthogonal matrix U such that the matrix K c satisfies

the l2 -scaling constraints in (12.37), we perform the eigenvalue-eigenvector
1 1
decomposition for the symmetric positive-definite matrix P − 2 K c P − 2 as
1 1
P − 2 K c P − 2 = RΘRT (12.69)
where Θ = diag{θ1 , θ2 , · · · , θn } with θi > 0 for all i and R is an n × n
orthogonal matrix. Next, an n × n orthogonal matrix S such that
eTi SΘS Tei = 1 for i = 1, 2, · · · , n (12.70)

can be obtained by a numerical procedure [13, p. 278]. Using (12.68)–(12.70),
it can be readily verified that the orthogonal matrix U = RS T leads to a K c
in (12.68) whose diagonal elements are equal to unity, hence the l2 -scaling
constraints in (12.37) are satisfied. This matrix U together with (12.67) yields
a solution for the problem of minimizing (12.19) subject to the l2 -scaling
constraints in (12.37) as
1
T = P 2 RS T (12.71)

12.6.1 Filter Description and Initial l2 -Sensitivity
Consider a third-order lowpass IIR digital filter described by
0.15940z 3 + 0.47821z 2 + 0.47825z + 0.15937
H(z) = 10−1
z 3 − 1.97486z 2 + 1.55616z − 0.45377
Magnitude response of this filter is depicted in Figure 12.1.
This filter can be realized by a state-space model (A, b, c, d)3 in (12.1) as
Figure 12.1 The magnitude response of a lowpass digital filter.

⎡ ⎤ ⎡ ⎤
0 1 0 0
A=⎣ 0 0 1 ⎦, b=⎣ 0 ⎦
0.45377 −1.55616 1.97486 1

c = 10−1 0.2317 0.2302 0.7930 , d = 0.01594
Carrying out the computation of (12.13), (12.21), (12.22), (12.25) and (12.26),
the controllability and observability Grammians K c and W o , and matrices
N (I 3 ) and M (I 3 ) were computed as
⎡ ⎤
17.061835 14.886464 9.602768
K c = ⎣ 14.886464 17.061835 14.886464 ⎦
9.602768 14.886464 17.061835
⎡ ⎤
0.048104 −0.119291 0.095427
W o = ⎣ −0.119291 0.311061 −0.249968 ⎦
0.095427 −0.249968 0.231012
⎡ ⎤
8.921384 −22.046468 17.916293
N (I 3 ) = ⎣ −22.046468 55.671739 −46.052035 ⎦
17.916293 −46.052035 42.522104
⎡ ⎤
35.705076 32.086502 22.476116
M (I 3 ) = ⎣ 32.086502 35.705076 32.086502 ⎦
22.476116 32.086502 35.705076
and the l2 -sensitivity measure in (12.9) was found to be
So = 158.890911
Performing the l2 -scaling to the above state-space model (A, b, c, d)3 with a
diagonal coordinate-transformation matrix given by
T o = diag{4.130597 4.130597 4.130597}
led the controllability Grammian to
K c = T −1 −T
o K cT o
⎡ ⎤
1.000000 0.872501 0.562821
= ⎣ 0.872501 1.000000 0.872501 ⎦
0.562821 0.872501 1.000000
and the l2 -sensitivity subject to l2 -scaling was found to be
So = 120.184738
12.6.2 l2 -Sensitivity Minimization

Choosing P 0 = I 3 in (12.30) as the initial estimate, and setting tolerance to
ε = 10−8 in (12.31), it took the algorithm addressed in Section 12.3 forty-one
iterations to converge to
⎡ ⎤
81.462531 48.380250 17.978611
P = ⎣ 48.380250 39.029141 24.141251 ⎦
17.978611 24.141251 23.949438
which yields
⎡ ⎤
8.189112 3.732952 0.682678
1
T =P2 = ⎣ 3.732952 4.301777 2.566890 ⎦
0.682678 2.566890 4.110287
where U in (12.32) was chosen as U = I 3 . The profile of the l2 -sensitivity

measure in (12.19) during the first 41 iterations of the algorithm is shown in
Figure 12.2.
Figure 12.2 Profile of S(P k ) during the first 41 iterations.

An equivalent realization was then obtained from (12.16) as

⎡ ⎤ ⎡ ⎤
0.608861 0.297687 0.046275 0.148865
A = ⎣ −0.320667 0.570460 0.440629 ⎦ , b = ⎣ −0.413801 ⎦
−0.082097 −0.388935 0.795539 0.476987

c = 0.329811 0.389074 0.400853
Moreover, the controllability and observability Grammians K c and W o , and
matrices N (P ) and M (P ) were computed from (12.17), (12.21), (12.22),
(12.25) and (12.26) as
⎡ ⎤
0.121737 0.026720 0.058892
K c = ⎣ 0.026720 0.455129 −0.024594 ⎦
0.058892 −0.024594 0.834658
⎡ ⎤
0.167774 0.125597 0.069289
W o = ⎣ 0.125597 0.425887 0.065911 ⎦
0.069289 0.065911 0.817617
⎡ ⎤
0.189796 −0.479778 0.376453
N (P ) = ⎣ −0.479778 1.249598 −1.002489 ⎦
0.376453 −1.002489 0.921438
⎡ ⎤
68.481534 59.708525 36.979189
M (P ) = ⎣ 59.708525 68.481534 59.708525 ⎦
36.979189 59.708525 68.481534
With realization (A, b, c, d)3 , the l2 -sensitivity measure in (12.19) was

minimized to
S(P ) = 7.832683

Constraints Using Quasi-Newton Algorithm
The quasi-Newton algorithm was applied to minimize (12.41) by choosing
T̂ = I 3 as an initial assignment, and setting tolerance to ε = 10−8 in (12.44).
It took the algorithm addressed in Section 12.4 twelve iterations to converge
to the solution
⎡ ⎤
1.399512 −0.399054 0.519219
T̂ = ⎣ −0.909296 0.955306 0.384011 ⎦
−1.034718 −0.498614 0.380517
⎡ ⎤
4.425883 −0.877680 −4.360270
T = ⎣ 2.866741 1.652464 −2.805767 ⎦
2.021688 2.654510 −0.495080
The profile of J(x) during the first 12 iterations of the algorithm is depicted
in Figure 12.3.
As a result, an equivalent realization was derived from (12.16) as
⎡ ⎤ ⎡ ⎤
0.664789 0.072468 0.067395 0.668510
A = ⎣ 0.074199 0.717025 0.590431 ⎦ , b = ⎣ −0.005654 ⎦
0.002388 −0.449754 0.593046 0.679708

c = 0.328860 0.228207 −0.204876
In addition, the controllability and observability Grammians K c and W o , and

(12.25) and (12.26) as
Figure 12.3 Profile of J(xk ) during the first 12 iterations.

⎡ ⎤
1.000000 0.622354 0.458665
K c = ⎣ 0.622354 1.000000 0.029904 ⎦
0.458665 0.029904 1.000000
⎡ ⎤
0.226005 0.168659 0.033336
W o = ⎣ 0.168659 0.222675 0.007233 ⎦
0.033336 0.007233 0.218695
⎡ ⎤
0.401254 −1.014637 0.795713
N (P ) = ⎣ −1.014637 2.643913 −2.120091 ⎦
0.795713 −2.120091 1.948564
⎡ ⎤
32.235281 28.094505 17.370799
M (P ) = ⎣ 28.094505 32.235281 28.094505 ⎦
17.370799 28.094505 32.235281
and the minimized l2 -sensitivity measure in (12.41) was found to be
J(T̂ ) = 8.672132

Constraints Using Lagrange Function
The recursive matrix equation in (12.63) together with (12.65) was applied to
minimize (12.59) by choosing P 0 = I 3 in (12.63) and (12.65) as an initial
assignment, and setting tolerance to ε = 10−6 in (12.66). It took the algorithm
addressed in Section 12.5 two-hundred sixty-seven iterations to converge to
the solution
⎡ ⎤
39.370678 23.471408 8.776617
P = ⎣ 23.471408 18.821159 11.571209 ⎦
8.776617 11.571209 11.378742
which in conjunction with (12.71) led to

⎡ ⎤
−3.372610 5.212122 0.911024
T = ⎣ −0.385158 4.309519 −0.317577 ⎦
1.806782 2.848438 0.026109
By using the coordinate transformation matrix T obtained above, an

equivalent realization was derived from (12.16) as
⎡ ⎤ ⎡ ⎤
0.737652 −0.291843 0.388074 0.385940
A = ⎣ 0.460932 0.635801 0.085061 ⎦ , b = ⎣ 0.098326 ⎦
−0.329056 0.012485 0.601407 0.866211

c = 0.056268 0.445851 0.015868
In addition, the controllability and observability Grammians K c and W o , and
(12.25) and (12.26) as
⎡ ⎤
1.000000 0.546707 0.546707
K c = ⎣ 0.546707 1.000000 0.028559 ⎦
0.546707 0.028559 1.000000
⎡ ⎤
0.222437 0.110113 0.109850
W o = ⎣ 0.110113 0.295772 0.007599 ⎦
0.109850 0.007599 0.149166
⎡ ⎤
0.401254 −1.014637 0.795713
N (P ) = ⎣ −1.014637 2.643914 −2.120092 ⎦
0.795713 −2.120092 1.948565
⎡ ⎤
32.235271 28.094497 17.370795
M (P ) = ⎣ 28.094497 32.235271 28.094497 ⎦
17.370795 28.094497 32.235271
and the minimized l2 -sensitivity measure in (12.19) was found to be
S(P ) = 8.672133
where J(P , λ) = 8.672132 and λ = −0.777542.
The profiles of S(P ) in (12.19) and tr[K c P −1 ] during the first 267
iterations of the algorithm are depicted in Figures 12.4.
12.7 Summary
The minimization problem of l2 -sensitivity for state-space digital filters has
been considered with or without l2 -scaling constraints. The problem free
from l2 -scaling constraints has been solved by employing a recursive matrix
equation. The constrained optimization problem has been solved by two
iterative methods: one converts the constrained optimization problem at hand
into an unconstrained problem and solves it using a quasi-Newton algorithm,
References 295
Figure 12.4 Profiles of S(P k ) and tr[K c P −1

k ] during the first 267 iterations.
while the other relaxes the constraints into a single constraint on matrix trace
and solves the relaxed problem with an efficient matrix iteration scheme based
on the Lagrange function. Simulation results in numerical experiments have
demonstrated the validity and effectiveness of these techniques.
References
[1] L. Thiele, “Design of sensitivity and round-off noise optimal state-
space discrete systems,” Int. J. Circuit Theory Appl., vol. 12, pp. 39–46,
Jan. 1984.
[2] V. Tavsanoglu and L. Thiele, “Optimal design of state-space digital filter
by simultaneous minimization of sensitivity and roundoff noise,” IEEE
Trans. Circuits Syst., vol. CAS-31, no. 10, pp. 884–888, Oct. 1984.
[3] L. Thiele, “On the sensitivity of linear state-space systems,” IEEE Trans.
[4] M. Iwatsuki, M. Kawamata and T. Higuchi, “Statistical sensitivity
and minimum sensitivity structures with fewer coefficients in discrete
time linear systems,” IEEE Trans. Circuits Syst., vol. CAS-37, no. 1,
pp. 72–80, Jan. 1989.
[5] G. Li, B. D. O. Anderson, M. Gevers and J. E. Perkins, “Optimal
FWL design of state-space digital systems with weighted sensitivity
minimization and sparseness consideration,” IEEE Trans. Circuits Syst.
I, vol. 39, no. 5, pp. 365–377, May 1992.
[6] W.-Y. Yan and J. B. Moore, “On L2 -sensitivity minimization of linear
state-space systems,” IEEE Trans. Circuits Syst. I, vol. 39, no. 8,
pp. 641–648, Aug. 1992.
[7] G. Li and M. Gevers, “Optimal synthetic FWL design of
state-space digital filters,” in Proc. ICASSP 1992, vol. 4,
pp. 429–432.
[8] M. Gevers and G. Li, Parameterizations in Control, Estimation and
Filtering Problems: Accuracy Aspects. New York: Springer-Verlag,
1993.
[9] T. Hinamoto, S. Yokoyama, T. Inoue, W. Zeng and W.-S. Lu,
“Analysis and minimization of L2 -sensitivity for linear systems and
two-dimensional state-space filters using general controllability and
observability Grammians,” IEEE Trans. Circuits Syst. I, vol. 49, no. 9,
pp. 1279–1289, Sept. 2002.
[10] T. Hinamoto, H. Ohnishi and W.-S. Lu, “Minimization of L2 -sensitivity
for state-space digital filters subject to L2 -dynamic-range scaling con-
straints,” IEEE Trans. Circuits Syst.-II, vol. 52, no. 10, pp. 641–645,
Oct. 2005.
[11] T. Hinamoto, K. Iwata and W.-S. Lu, “L2 -sensitivity Minimization of
one- and two-dimensional state-space digital filters subject to L2 -scaling
References 297
constraints,” IEEE Trans. Signal Process., vol. 54, no. 5, pp. 1804–1812,
May 2006.
[12] C. T. Mullis and R. A. Roberts, “Synthesis of minimum roundoff noise
fixed-point digital filters,” IEEE Trans. Circuits Syst., vol. CAS-23,
no. 9, pp. 551–562, Sept. 1976.
[13] S. Y. Hwang, “Minimum uncorrelated unit noise in state-space digital
filtering,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-25,
no. 4, pp. 273–281, Aug. 1977.
[14] L. L. Scharf, Statistical Signal Processing, Reading, MA: Addison-
Wesley, 1991.
[15] R. Fletcher, Practical Methods of Optimization, 2nd ed., Wiley,
New York, 1987.
[16] T. Kailath, Linear System, Englewood Cliffs, N.J.: Prentice-Hall, 1980.
13
Pole and Zero Sensitivity Analysis
and Minimization
13.1 Preview
When a transfer function with infinite accuracy coefficients is designed so
as to meet the filter specification requirements and the transfer function is
implemented by a state-space model with a finite binary representation, the
state-space parameters must be truncated or rounded to fit the finite-word-
length (FWL) constraints. This coefficient quantization inevitably changes the
characteristics of the digital filter. For instance, it may alter a stable filter to an
unstable one. This motivates the study of minimizing coefficients sensitivity.
As is well known, there are several ways to define sensitivity of a filter with
respect to its coefficients. Two of them are based on a mixed l1 /l2 norm and
a pure l2 norm, respectively. One of these sensitivity definitions measures
changes of a certain transfer function, while the other is defined in terms of
the poles and zeros of a filter. Several techniques concerning minimization of
the l1 /l2 - and l2 -sensitivity measures have been proposed [1–7]. Alternatively,
pole and zero sensitivity of a filter with respect to state-space parameters has
been analyzed and its reduction and minimization have been addressed [7–12].
A method for minimizing a zero sensitivity measure subject to minimal pole
sensitivity for state-space digital filters has also been explored without taking
l2 -scaling into account [11]. Recently, techniques for minimizing a weighted
pole and zero sensitivity measure subject to l2 -scaling constraints have been
developed [12].
In this chapter, a weighted measure for pole and zero sensitivity for state-
space digital filters is introduced, and the problem of minimizing this measure
is studied. To this end, an iterative technique for minimizing this measure
is presented by employing a recursive matrix equation. A simple method for
minimizing a zero sensitivity measure subject to minimal pole sensitivity is
also given by pursuing an optimal coordinate transformation. Furthermore, the
299
300 Pole and Zero Sensitivity Analysis and Minimization
minimization of the above sensitivity measure subject to l2 -scaling constraints

is performed by extending the aforementioned solution method. This method
relaxes the constraints into a single constraint on matrix trace and solves the
relaxed problem with an efficient matrix iteration scheme where the Lagrange
multiplier is determined via a bisection method. The problem of minimizing
the above sensitivity measure subject to l2 -scaling constraints is also explored
by applying a quasi-Newton algorithm where we convert the constrained
optimization problem at hand into an unconstrained problem and then solve
it iteratively by using a quasi-Newton algorithm.
Numerical experiments are included to demonstrate the validity and
effectiveness of the above techniques.
13.2 Pole and Zero Sensitivity Analysis

Consider a stable, controllable, and observable state-space digital filter
(A, b, c, d)n of order n described by
x(k + 1) = Ax(k) + bu(k)

(13.1)
where x(k) is an n × 1 state-variable vector, u(k) is a scalar input, y(k)

is a scalar output, and A, b, c and d are real constant matrices of appropri-
ate dimensions. The transfer function of the digital filter in (13.1) can be
expressed as
H(z) = c(zI n − A)−1 b + d (13.2)
Assuming that a direct path from the input to the output exists in (13.1),
i.e., d = 0, the poles and zeros of H(z) are given by {λl } = λ(A) and
{vl } = λ(Z), respectively, where an n × n matrix Z is defined by
Z = A − d−1 bc (13.3)
Notice that if d = 0 then we obtain [13]

−1 −1
H(z)−1 = d−1 − d−1 c zI n − A − d−1 bc bd
1 (13.4)
=
c(zI n − A)−1 b + d
This reveals that the zeros of the filter in (13.1) coincides with the eigenvalues
of matrix Z = A − d−1 bc.
13.2 Pole and Zero Sensitivity Analysis 301
The pole sensitivity matrix for the lth eigenvalue λl of A is defined by

⎡ ⎤
∂λl ∂λl
⎢ ∂a11 · · ·
∂a1n ⎥
∂λl ⎢ ⎢
⎥
.. ⎥
= ⎢ ... ..
. . ⎥ (13.5)
∂A ⎢ ⎥
⎣ ∂λl ∂λl ⎦
···
∂an1 ∂ann
and the zero sensitivity matrix for the lth eigenvalue vl of Z is defined by
⎡ ⎤
∂vl ∂vl
⎢ ∂z11 · · ·
⎢ ∂z1n ⎥
⎥
∂vl ⎢ .. ⎥
= ⎢ ... ..
. . ⎥ (13.6)
∂Z ⎢ ⎥
⎣ ∂vl ∂vl ⎦
···
∂zn1 ∂znn
Lemma 13.1
Let xp (l) be a right eigenvector of A corresponding to λl and y p (l) be the
reciprocal left eigenvector of A that corresponds to xp (l). If A has a full set
of n linearly independent eigenvectors, then the following holds:
∂λl T
= xp (l)y H
p (l) for l = 1, 2, · · ·, n (13.7)
∂A
Proof
Since A has n linearly independent eigenvectors {xp (l), l = 1, 2, · · ·, n}, we
have
Axp (l) = λl xp (l) for l = 1, 2, · · ·, n (13.8)
The reciprocal basis vectors y p (l) for l = 1, 2, · · ·, n are defined by

X p = xp (1), xp (2), · · ·, xp (n)
(13.9)
Y p = y p (1), y p (2), · · ·, y p (n) = X −H p
Hence
YH H
p X p = I n ⇐⇒ y p (k)xp (l) = δkl (13.10)
Multiplying (13.8) from the left by y p (l)H and using (13.10) yields
λl = y H
p (l)Axp (l) (13.11)
By virtue of the two identities from linear algebra

tr[AB] = tr[BA]
∂ ∂ (13.12)
tr[AB] = tr[BA] = B T
∂A ∂A
the differentiation of (13.11) with respect to A becomes
∂λl ∂ ∂
= y (l)H Axp (l) = tr[Axp (l)y p (l)H ] = (xp (l)y p (l)H )T
∂A ∂A p ∂A
(13.13)
Lemma 13.2
Let xz (l) be a right eigenvector of Z corresponding to vl and y z (l) be the
reciprocal left eigenvector that corresponds to xz (l). If Z has n linearly
independent eigenvectors, then we have
∂vl T
= xz (l)y H
z (l) for l = 1, 2, · · ·, n (13.14)
∂Z
Proof
The proof of this lemma is identical to that of Lemma 13.1.
Lemma 13.3
If (13.8) holds, it can be verified that
∂vl ∂vl ∂vl ∂vl T
= , = −d−1 c
∂A ∂Z ∂b ∂Z
(13.15)
∂vl −1 T ∂vl ∂vl −2 T ∂vl T
= −d b , =d b c
∂c ∂Z ∂d ∂Z
Proof
Let aij and zij be the (i, j)th entry of A and Z, respectively. By virtue of vl =
yH H H H
z (l)Zxz (l), Zxz (l) = vl xz (l), y z (l)Z = vl y z (l) and y z (l)xz (l) = 1,
we can write
∂vl ∂y Hz (l) ∂Z ∂xz (l)
= Zxz (l) + y Hz (l) xz (l) + y H
z (l)Z
∂aij ∂aij ∂aij ∂aij
∂Z ∂y H (l) ∂xz (l)
z
= yH
z (l) xz (l) + vl xz (l) + y H
z (l) (13.16)
∂aij ∂aij ∂aij
∂Z ∂Z ∂vl
= yH
z (l) xz (l) = y H
z (l) xz (l) =
∂aij ∂zij ∂zij
which is equivalent to ∂vl /∂A = ∂vl /∂Z. Similarly, letting bi and cj be the
ith entry of b and the jth entry of c, respectively, we can write
∂vl ∂Z
= yH
z (l) xz (l) = −d−1 y H
z (l)ei c xz (l)
∂bi ∂bi
(13.17)
= −d−1 eTi y H T T T −1 T ∂vl cT
z (l) xz (l) c = −d ei
∂Z
∂vl ∂Z
= yH
z (l) xz (l) = −d−1 y H T
z (l)bej xz (l)
∂cj ∂cj
(13.18)
∂vl
= −d−1 bT y H T T
z (l) xz (l) ej = −d−1 bT ej
∂Z
∂vl ∂Z
= yH
z (l) xz (l) = d−2 y H
z (l)bcxz (l)
∂d ∂d
(13.19)
= d−2 bT y H T T T −2 T ∂vl cT
z (l) xz (l) c = d b
∂Z
because
∂vl ∂Z
= yH
z (l) xz (l) = y H T
z (l)ei ej xz (l)
∂zij ∂zij
= eTi y H T T H T T
z (l) xz (l) ej = (y z (l) xz (l) )ij
∂vl
= yH T
z (l) xz (l)
T
∂Z
We now define the pole sensitivity measure Jp and the zero sensitivity
measure Jz for the digital filter in (13.1) as
n
n
∂λl 2 ∂λl T 2
Jp =
∂A = ∂A (13.20a)
l=1 F l=1 F
n
∂vl 2 ∂vl 2 ∂vl 2 ∂vl 2
Jz = + + + (13.20b)
∂A ∂b ∂c ∂d
l=1 F F F F
respectively, where ||M ||F denotes the Frobenius norm of an m × n complex

matrix M , which is defined by
n
m 1
|(M )ij |2
2
||M ||F =
i=1 j=1
Here, (M )ij stands for the (i, j)th element of M . By noting that
||M ||2F = tr[M M H ] = tr[M H M ] = ||M H ||2F (13.21)
and using (13.7), (13.14) and (13.15), we can write (13.20a) and (13.20b) as
n

Jp = xH
p (l)xp (l) yH
p (l)y p (l) (13.22a)
l=1
n

Jz = xH 2
z (l)xz (l) + αl yH 2
z (l)y z (l) + βl (13.22b)
l=1
respectively, where
αl = |d−1 cxz (l)|, βl = |d−1 y H

z (l)b| (13.22c)
Lemma 13.4
The pole sensitivity measure Jp in (13.22a) is lower bounded by
Jp ≥ n (13.23)
where the equality holds if and only if matrix A is normal, i.e., AAT = AT A.
Proof
By the Schwarz inequality and (13.10), it follows from (13.22a) that
n
n
n

Jp = ||xp (l)||22 ||y p (l)||22 ≥ |y p (l)H xp (l)|2 = δll = n
l=1 l=1 l=1
(13.24)
If y p (l) = μl xp (l) with μl = 0 holds for l = 1, 2, · · ·, n then Jp = n. In this
case, it follows from (13.10) that
Dμ X H
p Xp = In (13.25)
where D μ = diag{μ1 , μ2 , · · ·, μn }. This means that X −1 H

p = D μ X p holds.
On the other hand, Axp (l) = λl xp (l) for l = 1, 2, · · ·, n can be written as
A = X p D λ X −1
p (13.26)
where X p = [xp (1), xp (2), · · ·, xp (n)] and D λ = diag{λ1 , λ2 , · · ·, λn }.

By substituting X −1
p = Dμ X H p into (13.26), it can be verified that
AAH = AH A. Conversely, if A is normal, there exists an n × n unitary

matrix QH such that QH AQ = D λ which is equivalent to Q = X p .
Alternatively, it follows from (13.10) that
YH
p Xp = In (13.27)
Since QH Q = X H p X p = I n , we arrive at Y p = X p which means that

y p (l) = xp (l) for l = 1, 2, · · ·, n. As a result, we obtain Jp = n. This
completes the proof of Lemma 13.4.
Lemma 13.5
The zero sensitivity measure Jp in (13.22b) is lower bounded by
n

Jz ≥ (1 + |αl βl |)2 (13.28a)
l=1
where the equality holds if and only if matrix Z = A − d−1 bc is normal, i.e.,
ZZ T = Z TZ subject to its right eigenvector matrix X z satisfying
|α | |α | |αn |
1 2
XH z X z = diag , , · · ·, (13.28b)
|β1 | |β2 | |βn |

with X z = xz (1), xz (2), · · ·, xz (n) .
Proof
From (13.22b) and the arithmetic-geometric mean inequality, we obtain
n

Jz = ||xz (l)||22 ||y z (l)||22 + αl2 ||y z (l)||22 + βl2 ||xz (l)||22 + αl2 βl2
l=1
n

≥ ||xz (l)||22 ||y z (l)||22 + 2|αl βl | ||y z (l)||2 ||xz (l)||2 + αl2 βl2
l=1

= Jz
(13.29a)
where the equality holds if and only if
|αl | ||y z (l)||2 = |βl | ||xz (l)||2 for l = 1, 2, · · ·, n (13.29b)
Moreover, from (13.10) and (13.29a) it can be derived that

n

Jz = ||xz (l)||22 ||y z (l)||22 + 2|αl βl | ||y z (l)||2 ||xz (l)||2 + αl2 βl2
l=1
n

≥ |y z (l)H xz (l)|2 + 2|αl βl | |y z (l)H xz (l)| + αl2 βl2 (13.30a)
l=1
n

= (1 + |αl βl |)2
l=1
where the equality holds if and only if
y z (l) = κl xz (l), κl = 0 for l = 1, 2, · · ·, n (13.30b)
In order for the equality in both (13.29a) and (13.30a) to hold simultaneously,
both (13.29b) and (13.30b) must be satisfied. The conditions in (13.29b) and
(13.30b) are satisfied provided that
|α | |α | |αn | −1
1 2
Y z = X z diag , , · · ·, (13.31)
|β1 | |β2 | |βn |

with Y z = y z (1), y z (2), · · ·, y z (n) = X −H z . Evidently, (13.31) is
equivalent to (13.28b) and this completes the proof of Lemma 13.5.
13.3 Realization with Minimal Pole and Zero Sensitivity

13.3.1 Weighted Pole and Zero Sensitivity Minimization Without
Imposing l2 -Scaling Constraints
Now if a coordinate transformation defined by
x(k) = T −1 x(k) (13.32)
is applied to the digital filter in (13.1), the new realization (A, b, c, d)n can
be characterized by
A = T −1 AT , b = T −1 b, c = cT (13.33)
From (13.2) and (13.33), it is observed that the transfer function H(z) is
invariant under the coordinate transformation in (13.32). Note that the right
eigenvectors of the original and transformed system matrices A and A
13.3 Realization with Minimal Pole and Zero Sensitivity 307
corresponding to λl , namely xp (l) and xp (l), and their counterpart reciprocal

left eigenvectors, namely y p (l) and y p (l), are related as
xp (l) = T −1 xp (l), y p (l) = T T y p (l) (13.34)
for l = 1, 2, · · ·, n, respectively. Similarly, the right eigenvectors of the
original and transformed system matrices Z and Z, and their counterpart
reciprocal left eigenvectors are related as
xz (l) = T −1 xz (l), y z (l) = T T y z (l) (13.35)
for l = 1, 2, · · ·, n, respectively. Therefore, for the new realization
(A, b, c, d)n specified in (13.33), the pole and zero sensitivity measures in
(13.22a) and (13.22b) can be expressed as
n

Jp (T ) = xH
p (l)T
−T −1
T xp (l) yH T
p (l)T T y p (l) (13.36a)
l=1
n

Jz (T ) = xH
z (l)T
−T −1
T xz (l) + αl2 yH T 2
z (l)T T y z (l) + βl
l=1
(13.36b)
respectively.
Under these circumstances, we examine a weighted pole and zero
sensitivity measure Jγ (T ) defined by
Jγ (T ) = γJp (T ) + (1 − γ)Jz (T ) (13.37)
where 0 ≤ γ ≤ 1 is a weighting factor to control the trade-off between the two
component sensitivities in the sense that a Jγ (T ) with a greater γ represents
a measure that places more emphasis on pole sensitivity, while a Jγ (T ) with
a smaller γ serves as a measure that weights more heavily on zero sensitivity.
In addition, by setting γ to unity or zero Jγ (T ) becomes Jp (T ) or Jz (T ),
respectively.
The problem of minimizing the weighted pole and zero sensitivity measure
is now formulated as follows: For given A, b, c and d, obtain an n × n
transformation matrix T which minimizes Jγ (T ) in (13.37) with γ specified
by the designer.
The pole and zero sensitivity measures in (13.36a) and (13.36b) can also
be expressed in terms of matrix P = T T T as
n

Jp (T ) = tr xp (l)xH
p (l)P
−1
tr y p (l)y H
p (l)P (13.38a)
l=1
n

Jz (T ) = tr xz (l)xH
z (l)P
−1
+ αl2 tr y z (l)y H 2
z (l)P + βl
l=1
(13.38b)
respectively. To minimize Jγ (T ) in (13.37) with respect to an n × n positive-
definite symmetric matrix P , we compute
∂Jγ (T )
= M γ (P ) − P −1 N γ (P )P −1 (13.39)
∂P
where
n

M γ (P ) = γ tr xp (l)xH
p (l)P
−1
y p (l)y H
p (l)
l=1
n

+ (1 − γ) tr xz (l)xH
z (l)P
−1
+ αl2 y z (l)y H
z (l)
l=1
n

N γ (P ) = γ tr y p (l)y H H
p (l)P xp (l)xp (l)
l=1
n

+ (1 − γ) tr y z (l)y H 2 H
z (l)P + βl xz (l)xz (l)
l=1
By setting ∂Jγ (T )/∂P = 0, we obtain
P M γ (P )P = N γ (P ) (13.40)
The equation in (13.40) is highly nonlinear with respect to P . An effective

approach for solving (13.40) is to relax it into the recursive second-order
matrix equation
P k+1 M γ (P k )P k+1 = N γ (P k ) (13.41)
The recursion starts with an initial estimate P 0 . Thus P k in (13.41) is
assumed to be known from the previous recursion, and the next iterate P k+1
is given by
1 1 1 1 1
P k+1 = M γ (P k )− 2 [M γ (P k ) 2 N γ (P k )M γ (P k ) 2 ] 2 M γ (P k )− 2
(13.42)
1 1

Jγ P k+1
2
− Jγ P k2 < ε (13.43)
13.3 Realization with Minimal Pole and Zero Sensitivity 309
1
is satisfied for a prescribed tolerance ε > 0 where P 2 stands for the square
root of matrix P . If the iteration is terminated at step k, P k is claimed to be
a solution point.
We now turn our attention to the construction of a coordinate transfor-
mation matrix T that solves the problem of minimizing the weighted pole
and zero sensitivity measure in (13.37). Since P = T T T , an optimizing
T assumes the form 1
T = P 2U (13.44)
where U is an arbitrary n × n orthogonal matrix.
13.3.2 Zero Sensitivity Minimization Subject to Minimal

Pole Sensitivity
Suppose the coordinate√transformation matrix T o yields Jp (T√o ) = n, it can
be verified that Jp (± ζ T o ) = n holds for any scalar ± ζ. Therefore,
one can reduce the zero sensitivity as much as possible while keep the
pole sensitivity unaltered by adequately tuning the value of ζ. To this end,
we compute
√ n
∂Jz (± ζ T o )
=ζ −2 ζ 2 αl2 y H T
z (l)T o T o y z (l)
∂ζ
l=1 (13.45)

−T −1
−βl2 xHz (l)T o T o xz (l)
√
By solving ∂Jz (± ζ T o )/∂ζ = 0, the optimal value of ζ is found to be

n

βl2 xH −T −1
z (l)T o T o xz (l)

ζ = l=1n (13.46)

2 H T
α y (l)T T y (l)
l z o o z
l=1
This implies that√the optimal coordinate transformation matrix T opt that

minimizes Jz (± ζ T o ) with respect to ζ subject to Jp (T o ) = n can be
obtained as
T opt = ± ζ T o (13.47)
where ζ is given by (13.46). Notice that by applying the arithmetic-geometric
mean inequality, the following holds:
Jz (T o ) − Jz (T opt )
n
n

= αl2 y H T
z (l)T o T o y z (l) + βl2 xH −T −1
z (l)T o T o xz (l)
l=1 l=1

n n

−2 αl2 y H
z (l)T T T
o o zy (l) βl2 xH −T −1
z (l)T o T o xz (l)
l=1 l=1
≥0
(13.48)
opt
Therefore, as expected, by using the optimal T instead of matrix T o , the
zero sensitivity always gets reduced.
13.4 Pole Zero Sensitivity Minimization Subject to

l2 -Scaling Constraints Using Lagrange Function
13.4.1 l2 -Scaling Constraints and Problem Formulation
The controllability Grammian K c of the digital filter in (13.1) plays an
important role in the dynamic-range scaling of the state-variable vector x(k),
and K c can be obtained by solving the Lyapunov equation
K c = AK c AT + bbT (13.49)
With an equivalent state-space realization as specified in (13.33), the
controllability Grammian for the transformed system assumes the form
K c = T −1 K c T −T (13.50)
If l2 -scaling constraints are imposed on the new state-variable vector x(k) in
(13.32), it is required that
eTi K c ei = eTi T −1 K c T −T ei = 1 for i = 1, 2, · · ·, n (13.51)
The problem being considered here is to obtain an n×n transformation matrix
T that minimizes (13.37) subject to the l2 -scaling constraints in (13.51).

Constraints — Using Lagrange Function
To minimize (13.37), where Jp (T ) and Jz (T ) are given by (13.38a) and
(13.38b), respectively, with respect to an n × n symmetric positive-definite
13.4 Pole Zero Sensitivity Minimization 311
matrix P subject to l2 -scaling constraints in (13.51), we define the Lagrange

function
Iγ (P , ξ) = Jγ (T ) + ξ tr[K c P −1 ] − n (13.52)
where ξ is a Lagrange multiplier, and compute the gradients
∂Iγ (P , ξ)
= M γ (P ) − P −1 N γ (P )P −1 − ξP −1 K c P −1
∂P
(13.53)
∂Iγ (P , ξ)
= tr[K c P −1 ] − n
∂ξ
where M γ (P ) and N γ (P ) are defined in (13.39). Set ∂Iγ (P , ξ)/∂P = 0
and ∂Iγ (P , ξ)/∂ξ = 0 to yield
P M γ (P )P = Gγ (P , ξ), tr[K c P −1 ] = n (13.54)
where
Gγ (P , ξ) = N γ (P ) + ξK c
To solve the highly nonlinear equations in (13.54), we propose an iterative
technique which starts with an initial estimate P 0 and relaxes the first equation
in (13.54) into a recursive second-order matrix equation
P k+1 M γ (P k )P k+1 = Gγ (P k , ξk+1 ) (13.55)
where P k is known from the previous recursion. Solving (13.55) for a
symmetric and positive-definite P k+1 , we obtain
1 1 1 1 1
P k+1 = M γ (P k )− 2 [M γ (P k ) 2 Gγ (P k , ξk+1 )M γ (P k ) 2 ] 2 M γ (P k )− 2
(13.56)
where the Lagrange multiplier ξk+1 can be efficiently obtained using a
bisection method so that

f (γ, ξk+1 ) = n − tr K̃ k G̃k (γ, ξk+1 ) < ε (13.57)
with
1 1 1
G̃k (γ, ξk+1 ) = [M γ (P k ) 2 Gγ (P k , ξk+1 )M γ (P k ) 2 ]− 2
1 1
K̃ k = M γ (P k ) 2 K c M γ (P k ) 2

Iγ P k+1 , ξk+1 − Iγ P k , ξk < ε (13.58)
is satisfied for a prescribed tolerance ε > 0. If the iteration is terminated at
step k, we set P = P k and claim it to be a solution.

Having obtained an optimal P , we now turn our attention to the construction
of the optimal coordinate transformation matrix T that solves the problem of
minimizing (13.37) subject to l2 -scaling constraints in (13.51). As is analyzed
earlier, the optimal T assumes the form
1
T = P 2U (13.59)
where U is an n × n orthogonal matrix that can be determined to satisfy the

l2 -scaling constraints as follows. From (13.50) and (13.59), it follows that
1 1
K c = U T P − 2 K cP − 2 U (13.60)
To find an n × n orthogonal matrix U such that the matrix K c satisfies l2 -

scaling constraints in (13.51), we perform the eigen-decomposition for the
1 1
symmetric positive-definite matrix P − 2 K c P − 2 as
1 1
P − 2 K c P − 2 = RΘRT (13.61)
where Θ = diag{θ1 , θ2 , · · ·, θn } with θi > 0 for all i and R is an n × n

orthogonal matrix. Next, an n × n orthogonal matrix S such that
eTi SΘS Tei = 1 for i = 1, 2, · · ·, n (13.62)
can be obtained by a numerical procedure [18, p.278]. Using (13.60)-(13.62),

it can be readily verified that the orthogonal matrix U = RS T leads to a K c
in (13.60) whose diagonal elements are equal to unity, hence the l2 -scaling
constraints in (13.51) are satisfied. This matrix U together with (13.59) yields
the solution for the problem of minimizing (13.37) subject to the l2 -scaling
constraints in (13.51) as
1
T = P 2 RS T (13.63)
13.5 Pole and Zero Sensitivity Minimization Subject to

l2 -Scaling Constraints Using Quasi-Newton Algorithm
13.5.1 l2 -Scaling and Problem Formulation
This section explores a technique for pole and zero sensitivity minimization
subject to l2 -scaling constraints, where the coordinate transformation matrix
13.5 Pole and Zero Sensitivity Minimization Subject to l2 -Scaling Constraints 313
is optimized using an unconstrained optimization approach known as quasi-

Newton algorithm.
We start by defining
− 12
T̂ = T T K c (13.64)
which leads (13.51) to
−T −1
eTi T̂ T̂ ei = 1 for i = 1, 2, · · ·, n (13.65)
−1
Evidently, these constraints are always satisfied if matrix T̂ assumes the
form
−1 t1 t2 tn
T̂ = , , · · ·, (13.66)
||t1 || ||t2 || ||tn ||
Defining an n2 × 1 vector x = (tT1 , tT2 , · · ·, tTn )T that consists of independent
−1
variables in T̂ , the pole and zero sensitivity measures Jp (T ) and Jz (T ) in
(13.36a) and (13.36b) can be written as
n
−1 −T T
Jp (x) = x̂H
p (l)T̂ T̂ x̂p (l) ŷ H
p (l)T̂ T̂ ŷ p (l)
l=1
n
−1 −T T
Jz (x) = x̂H
z (l)T̂ T̂ x̂z (l) + αl2 ŷ H 2
z (l)T̂ T̂ ŷ z (l) + βl
l=1
(13.67)
respectively, where
−1 1
x̂p (l) = K c 2 xp (l), ŷ p (l) = K c2 y p (l)
−1 1
x̂z (l) = K c 2 xz (l), ŷ z (l) = K c2 y z (l)
The original constrained optimization problem formulated in Section 13.4.1
can now be converted into an unconstrained optimization problem of obtaining
an n2 × 1 vector x which minimizes
Jγ (x) = γJp (x) + (1 − γ)Jz (x), 0≤γ≤1 (13.68)

Constraints — Using Quasi-Newton Algorithm
We now solve the unconstrained problem by a quasi-Newton algorithm that
−1
starts with an initial point x0 obtained from the assignment T̂ = I n . In the
kth iteration, the quasi-Newton algorithm, known as the Broyden-Fletcher-

Goldfarb-Shanno (BFGS) algorithm [14], updates the current point xk to point
xk+1 as [15]
xk+1 = xk + αk dk (13.69)
where

dk = −S k ∇Jγ (xk ), αk = arg min Jγ (xk + αdk )
α

ϕTk S k ϕk δ k δ Tk δ k ϕTk S k + S k ϕk δ Tk
S k+1 = S k + 1 + − , S0 = I
ϕTk δ k ϕTk δ k ϕTk δ k
δ k = xk+1 −xk , ϕk = ∇Jγ (xk+1 ) − ∇Jγ (xk )
Here, ∇Jγ (x) denotes the gradient of Jγ (x) with respect to x, and S k is a
positive-definite approximation of the inverse Hessian matrix of Jγ (xk ).
|Jγ (xk+1 ) − Jγ (xk )| < ε (13.70)
is satisfied where ε > 0 is a prescribed tolerance.
13.5.3 Gradient of J(x)

The gradient of Jγ (x) can be evaluated using closed-form expressions as
∂Jγ (x) Jγ (T̂ ij ) − Jγ (T̂ )

= lim
∂tij Δ→0 Δ
(13.71)
∂Jp (x) ∂Jz (x)
=γ + (1 − γ)
∂tij ∂tij
where T̂ ij is the matrix obtained from T̂ with a perturbed (i, j)th component
by Δ. It follows that [16, p. 655]
ΔT̂ g ij eTj T̂
T̂ ij = T̂ + T̂ + ΔT̂ g ij eTj T̂
1− ΔeTj T̂ g ij
−1 −1 (13.72)
T̂ ij = T̂ − Δg ij eTj

t
g ij = −∂ ||tj || /∂tij = ||t1||3 (tij tj − ||tj ||2 ei )
j j
and
∂Jp (x) H
n
T
= x̂p (l)M̂ (T̂ )x̂p (l) ŷ H
p (l)T̂ T̂ ŷ p (l)
∂tij
l=1
−1 −T

+ x̂H
p (l)T̂ T̂ x̂p (l) ŷ H
p (l)N̂ (T̂ )ŷ p (l)
(13.73)
∂Jz (x) H
n
T
= x̂z (l)M̂ (T̂ )x̂z (l) ŷ H 2
z (l)T̂ T̂ ŷ z (l) + βl
∂tij
l=1
−1 −T

+ x̂H
z (l)T̂ T̂ x̂z (l) + αl2 ŷ H
z (l)N̂ (T̂ )ŷ z (l)
where
−T −1
M̂ (T̂ ) = − [ g ij eTj T̂ + T̂ ej g Tij ]
T T
N̂ (T̂ ) = T̂ ej g Tij T̂ + T̂ g ij eTj T̂

13.6.1 Filter Description and Initial Pole and Zero Sensitivity
Consider a state-space digital filter (A, b, c, d)4 in (13.1) described by
⎡ ⎤ ⎡ ⎤
3.7183 1 0 0 0.1755
⎢ −5.2153 0 1 0 ⎥ ⎢
−3 ⎢ 0.0178 ⎥
⎥
A=⎢ ⎣ 3.2689 0 0 1 ⎦
⎥, b = 10 ⎣ 0.1652 ⎦
−0.7724 0 0 0 0.0052

c= 1 0 0 0 , d = 0.0227
for which the eigenvalues of matrices A and Z were found to be
λ = 0.963556 ± j0.156437, 0.895594 ± j0.092081
v = 1.147681 ± j0.329541, 0.707604 ± j0.202980
respectively. By using (13.22c), we obtained
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
α1 12.215615 β1 0.386463
⎢ α2 ⎥ ⎢ 12.215615 ⎥ ⎢ β2 ⎥ ⎢ 0.386463 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥
⎣ α3 ⎦ = ⎣ 9.687213 ⎦ , ⎣ β3 ⎦ ⎣ 0.310390 ⎦
α4 9.687213 β4 0.310390
The original pole and zero sensitivity measures in (13.22a) and (13.22b) were
computed as
Jp = 7.209829 × 106 , Jz = 1.135788 × 106
and the controllability Grammian was computed using (13.49) as

⎡ ⎤
0.071608 −0.195274 0.178646 −0.054830
⎢ −0.195274 0.533746 −0.489317 0.150472 ⎥
Kc = ⎢ ⎣ 0.178646 −0.489317
⎥
⎦
0.449437 −0.138452
−0.054830 0.150472 −0.138452 0.042721

Without Imposing l2 -Scaling Constraints
By choosing γ = 1 in (13.37), ε = 10−8 in (13.43), and P 0 = I 4 in (13.42)
as an initial estimate, it took the algorithm addressed in Section 13.3 two
iterations to converge to
⎡ ⎤
0.318569 −0.890465 0.835038 −0.262540
⎢ −0.890465 2.494146 −2.343591 0.738320 ⎥
P = 103 ⎢ ⎣ 0.835038 −2.343591
⎥
2.206461 −0.696498 ⎦
−0.262540 0.738320 −0.696498 0.220301
which yields
⎡ ⎤
0.504660 −1.279387 1.093553 −0.313564
1 ⎢ −1.279387 3.491572 −3.188277 0.973880 ⎥
T =P2 = 10 ⎢
⎣ 1.093553 −3.188277
⎥
3.110342 −1.014605 ⎦
−0.313564 0.973880 −1.014605 0.356118
From (13.36a) and (13.36b), it follows that
Jp (T ) = 4.000000, Jz (T ) = 1.819305 × 105
and from (13.33) we obtained

⎡ ⎤
0.938212 0.125295 −0.023034 0.057956
⎢ −0.112124 0.961707 0.106409 0.016275 ⎥
A=⎢ ⎣ −0.039345 −0.091772
⎥
0.920683 0.074085 ⎦
−0.073949 −0.006215 −0.060044 0.897698
⎡ ⎤
6.180513
⎢ 6.309750 ⎥
b = 10−3 ⎢ ⎥
⎣ 6.330612 ⎦
6.224428

c = 5.046598 −12.793870 10.935529 −3.135635
T T
It was observed that A A = A A holds, hence matrix A is normal and
the lower bound n = 4 of Jp (T ) is achieved. The weighted pole and zero
1
sensitivity performance Jγ (P k2 ) for k = 0, 1, 2 were obtained as
1 1 1

Jγ (P 0 ) Jγ (P 1 ) Jγ (P 2 )
2 2 2

= 7.209829 × 106 4.000000 4.000000
from which it is seen that the iterative algorithm converges to the optimal
solution with two iterations.
Since matrix T shown above yields Jp (T ) = 4, we set T = T o . Then,
by applying (13.46) and (13.47) it was found that
⎡ ⎤
0.133686 −0.338914 0.289686 −0.083064
⎢ −0.338914 0.924930 −0.844586 0.257984 ⎥
T opt = ⎢
⎣ 0.289686 −0.844586
⎥
0.823940 −0.268772 ⎦
−0.083064 0.257984 −0.268772 0.094337
and
Jz T opt ) = 593.186873, ζ = 7.017385 × 10−4
It is noted that T opt obtained above corresponds to the optimal coor-
dinate√transformation matrix that minimizes a zero sensitivity measure
Jz (± ζ T o ) with respect to ζ subject to minimal pole sensitivity satisfying
Jp (T o ) = 4.
The optimal realization specified by Aopt = (T opt )−1 AT opt , bopt =
(T ) b, copt = cT opt became
opt −1
⎡ ⎤
0.938212 0.125295 −0.023034 0.057956
⎢ −0.112124 0.961707 0.106409 0.016275 ⎥
Aopt = ⎢⎣ −0.039345 −0.091772
⎥
0.920683 0.074085 ⎦
−0.073949 −0.006215 −0.060044 0.897698
⎡ ⎤
0.233312
⎢ 0.238191 ⎥
bopt = ⎢ ⎥
⎣ 0.238978 ⎦
0.234970

copt = 0.133686 −0.338914 0.289686 −0.083064
The optimized pole and zero sensitivity measures corresponding to various

weight values γ are summarized in Table 13.1 where
4

Jz (T ) ≥ (1 + |αl βl |)2 = 97.566165
l=1
always holds. From Table 13.1, it is observed that the lower bound of Jz (T )
was achieved with γ = 0. It is also observed that the sum of pole sensitivity and
zero sensitivity reaches minimum when the weight value γ = 0.5 is chosen.
Moreover, with γ as an adjustable design parameter, the proposed weighted
optimization provides a variety of options to suit the need of a particular
application where the designer may prefer smaller pole sensitivity or smaller
zero sensitivity.
13.6.3 Weighted Pole and Zero Sensitivity Minimization Subject

to l2 -Scaling Constraints Using Lagrange Function
By setting γ = 1 in (13.37), ε = 10−8 in (13.58), and choosing P 0 = I 4 in
(13.56) as an initial estimate, it took the algorithm addressed in Section 13.4
eight iterations to converge to
Table 13.1 Performance comparison in Section 13.6.2

γ Jγ (T ) Jp (T ) Jz (T ) Jp (T )+Jz (T )
5
1.0 4.000000 4.000000 1.819305×10 1.819345×105
0.9 26.138641 9.756561 173.577365 183.333926
0.8 40.878190 13.926822 148.683663 162.610485
0.7 53.474454 18.020813 136.199616 154.220429
0.6 64.638191 22.453675 127.914966 150.368641
0.5 74.608206 27.598918 121.617494 149.216412
0.4 83.435795 34.001550 116.391959 150.393509
0.3 91.030285 42.698223 111.744026 154.442249
0.2 97.096840 56.211294 107.318227 163.529521
0.1 100.813973 83.685603 102.717125 186.402729
0.0 97.566165 283.698424 97.566165 381.264589
⎡ ⎤
0.336146 −0.948542 0.899849 −0.286556
⎢ −0.948542 2.679553 −2.544864 0.811398 ⎥
P = 10 ⎢
⎣ 0.899849 −2.544864
⎥
2.419739 −0.772465 ⎦
−0.286556 0.811398 −0.772465 0.246930
which in conjunction with (13.63) led to

⎡ ⎤
1.163353 −0.501460 1.162615 −0.636344
⎢ −3.325136 1.323961 −3.211571 1.916231 ⎥
T =⎢ ⎣ 3.196696 −1.163301
⎥
⎦
2.981285 −1.933181
−1.032291 0.337201 −0.927826 0.655060
where
Jp (T ) = 4.000000, Jz (T ) = 5452.531195.
By using (13.33), we obtained
⎡ ⎤
0.927393 −0.117544 −0.055973 0.012748
⎢ 0.127480 0.931340 −0.021222 0.036623 ⎥
A=⎢ ⎣ 0.003557
⎥
0.063032 0.935066 0.122093 ⎦
0.029127 0.015908 −0.123729 0.924501
⎡ ⎤
0.320110
⎢ 0.041111 ⎥
b=⎢ ⎣ −0.168643 ⎦
⎥
0.244432

c = 1.163353 −0.501460 1.162615 −0.636344
T T
It was observed that A A = A A holds. This means that matrix A is normal
and the lower bound n = 4 of Jp (T ) is achieved. For the new realization, the
controllability Grammian was found from (13.50) to be
⎡ ⎤
1.000000 0.296865 −0.585417 0.804018
⎢ 0.296865 1.000000 0.585417 0.804018 ⎥
Kc = ⎢ ⎣ −0.585417 0.585417
⎥
1.000000 0.000000 ⎦
0.804018 0.804018 0.000000 1.000000
where the l2 -scaling constraints are satisfied. Iγ (P , ξ) performance of 8

iterations is shown in Figure 13.1, from which it is seen that the iterative
algorithm converges with 8 iterations where
Figure 13.1 Profile of Iγ (P, ξ) during the first 8 iterations with γ = 1.
⎡ ⎤ ⎡ ⎤
ξ1 ξ2 −1.069263 × 106 −4.213525 × 102
⎢ ξ3 ξ4 ⎥ ⎢ 0 −4.496284 × 10−1 ⎥
⎢ ⎥ = ⎢ −7.117870 × 10−2 ⎥
⎣ ξ5 ξ6 ⎦ ⎣ −1.817527 × 10 −5.577404 × 10−4 ⎦
ξ7 ξ8 −1.676641 × 10−5 −5.025417 × 10−7
The optimized pole and zero sensitivity measures subject to l2 -scaling

constraints corresponding to various values of γ are summarized in Table 13.2.
Table 13.2 Lagrange function method subject to scaling constraints

1.0 4.000000 4.000000 5452.531195 5456.531195
0.9 26.798631 10.315471 175.147077 185.462547
0.8 41.489644 14.827471 148.138333 162.965804
0.7 53.902744 19.064263 135.192533 154.256796
0.6 64.863774 23.442505 126.995679 150.438183
0.5 74.672573 28.277118 121.068028 149.345146
0.4 83.435914 33.963990 116.417196 150.381186
0.3 91.139040 41.177416 112.551165 153.728580
0.2 97.629962 51.388289 109.190380 160.578669
0.1 102.474584 69.082077 106.184863 175.266940
0.0 104.016409 125.930718 104.016409 229.947127
13.6.4 Weighted Pole and Zero Sensitivity Minimization Subject

to l2 -Scaling Constraints Using Quasi-Newton Algorithm
By choosing γ = 1 in (13.68), ε = 10−8 in (13.70), and starting with an
−1
initial point x0 obtained from the assignment T̂ = I 4 in (13.66), it took
the algorithm addressed in Section 13.5 nineteen iterations to converge to
⎡ ⎤
0.864243 0.900469 0.786701 0.542671
−1 ⎢ −0.490448 0.391427 0.345446 0.722078 ⎥
T̂ = ⎢ ⎣ −0.094086 −0.168073 0.507431 −0.076744 ⎦
⎥
0.060769 −0.087704 0.065444 0.422163

or equivalently,
⎡ ⎤
0.149803 −0.161958 0.261061 −0.203040
⎢ −0.366793 0.433947 −0.742778 0.584686 ⎥
T =⎢
⎣ 0.299031 −0.385952
⎥
0.706504 −0.563004 ⎦
−0.080834 0.113909 −0.224536 0.181708
where
Jp (T ) = 4.000000, Jz (T ) = 936.412522.
By using (13.33), we obtained
⎡ ⎤
0.955963 0.084559 −0.070022 0.105312
⎢ −0.107373 0.934715 0.072300 0.045359 ⎥
A=⎢ ⎣ 0.033913 −0.107775
⎥
⎦
0.913094 0.026735
−0.102298 0.006930 −0.057878 0.914528
⎡ ⎤
0.068313
⎢ 0.257916 ⎥
b=⎢ ⎣ 0.497924 ⎦
⎥
0.484019

c = 0.149803 −0.161958 0.261061 −0.203040
T T
It was observed that A A = A A holds. This implies that matrix A
is normal and the lower bound n = 4 of Jp (T ) is achieved. For the new
realization, the controllability Grammian was found from (13.50) to be
⎡ ⎤
1.000000 0.596733 0.466712 0.147733
⎢ 0.596733 1.000000 0.752591 0.747172 ⎥
Kc = ⎢ ⎣ 0.466712 0.752591 1.000000 0.665045 ⎦
⎥
0.147733 0.747172 0.665045 1.000000

Figure 13.2 Profile of Jγ (T ) during the first 19 iterations with γ = 1.
where the l2 -scaling constraints are satisfied. The pole sensitivity performance
of 19 iterations is shown in Figure 13.2 where γ = 1, i.e., Jγ (T ) =
Jp (T ), from which it is seen that the iterative algorithm converges with
19 iterations.
The optimized pole and zero sensitivity measures subject to the l2 -
scaling constraints corresponding to various values of γ are summarized in
Table 13.3.
Table 13.3 Quasi-Newton method subject to scaling constraints

1.0 4.000000 4.000000 936.412522 940.412522
0.9 26.798631 10.315452 175.147242 185.462694
0.8 41.489644 14.827462 148.138371 162.965833
0.7 53.902744 19.064221 135.192631 154.256852
0.6 64.863774 23.442437 126.995780 150.438217
0.5 74.672573 28.277078 121.068068 149.345146
0.4 83.435914 33.963942 116.417229 150.381170
0.3 91.139040 41.177323 112.551204 153.728527
0.2 97.629962 51.388153 109.190414 160.578567
0.1 102.474584 69.081932 106.184879 175.266811
0.0 104.016409 125.930606 104.016409 229.947015
13.7 Summary 323
Table 13.4 Performance comparison among four methods

Method l2 -Sensitivity Jp Jz
Quasi-Newton (γ = 1) 143.884664 4 936.412522
Quasi-Newton (γ = 0) 995.595955 125.930606 104.016409
Lagrange Function (γ = 1) 657.722872 4 5452.531195
Lagrange Function (γ = 0) 995.596747 125.930718 104.016409
Hinamoto et al. [5] 92.906421 4.628256 398.788778
Hinamoto et al. [6] 92.906418 4.628253 398.789252
We now conclude this section with a remark on the numerical results

summarized in Tables 13.2 and 13.3. Concerning the case of γ = 1, it follows
from (13.37) that the use of γ = 1 simply excludes Jz (T ) in the optimization
procedure and, as a result, the zero sensitivity went wildly large as shown in
Tables 13.2 and 13.3. For practical system implementations, therefore, the use
of γ in the range 0 ≤ γ < 1 is recommended.
The problem of minimizing the l2 -sensitivity of a transfer function subject
to l2 -scaling constraints was solved by different techniques in Chapter 12
[5, 6]. In Table 13.4 the performances of the methods presented in this chapter
are compared with those achieved by the methods in Chapter 12 for four
implementation settings. From this table it is observed that in case γ = 1,
the methods presented in this chapter provide reduced pole sensitivity than
those in Chapter 12, and in case γ = 0, the techniques in this chapter produce
reduced zero sensitivity than those in Chapter 12.
13.7 Summary
Three iterative techniques for minimizing a weighted pole and zero sensitivity
measure have been presented. The problem free from l2 -scaling constraints
has been solved by employing a recursive matrix equation. A simple method
has also been given to obtain the optimal coordinate transformation matrix
which minimizes a zero sensitivity measure subject to minimal pole sensitivity.
Moreover, two iterative methods for minimizing the weighted pole and zero
sensitivity measure subject to l2 -scaling constraints have been introduced.
One relaxes the constraints into a single constraint on matrix trace and solves
the relaxed problem with an efficient matrix iteration scheme based on the
Lagrange function and a bisection method, while the other converts the
constrained optimization problem at hand into an unconstrained problem and
solves it using a quasi-Newton algorithm.
Simulation results in numerical experiments have demonstrated the

validity and effectiveness of the above techniques.
References
[1] L. Thiele, “Design of sensitivity and round-off noise optimal state-
space discrete systems,” Int. J. Circuit Theory Appl., vol. 12, pp. 39–46,
Jan. 1984.
[2] L. Thiele, “On the sensitivity of linear state-space systems,” IEEE Trans.
[3] G. Li, B. D. O. Anderson, M. Gevers and J. E. Perkins, “Optimal
FWL design of state-space digital systems with weighted sensitivity
minimization and sparseness consideration,” IEEE Trans. Circuits Syst. I,
vol. 39, no. 5, pp. 365–377, May 1992.
[4] W.-Y. Yan and J. B. Moore, “On L2 -sensitivity minimization of lin-
ear state-space systems,” IEEE Trans. Circuits Syst. I, vol. 39, no. 8,
pp. 641–648, Aug. 1992.
straints,” IEEE Trans. Circuits Syst. II, vol. 52, no. 10, pp. 641–645,
Oct. 2005.
[6] T. Hinamoto, K. Iwata and W.-S. Lu, “L2 -sensitivity minimization of
one- and two-dimensional state-space digital filters subject to L2 -scaling
constraints,” IEEE Trans. Signal Process., vol. 54, no. 5, pp. 1804–1812,
May 2006.
[7] M. Gevers and G. Li, Parameterizations in Control, Estimation and
Filtering Problems: Accuracy Aspects, New York: Springer-Verlag,
1993.
[8] P. E. Mantey, “Eigenvalue sensitivity and state-variable selection,” IEEE
Trans. Automatic Contr., vol. AC-13, no. 3, pp. 263–269, Jun. 1968.
[9] R. E. Skelton and D. A. Wagie, “Minimal root sensitivity in linear
systems,” J. Guidance Contr., vol. 7, no. 5, pp. 570–574, Sep.–Oct. 1984.
[10] D. Williamson, “Roundoff noise minimization and pole-zero sensitivity
in fixed-point digital filters using residue feedback,” IEEE Trans. Acoust.,
Speech, Signal Process., vol. ASSP-34, no. 5, pp. 1210–1220, Oct. 1986.
[11] G. Li, “On pole and zero sensitivity of linear systems,” IEEE Trans.
Circuits Syst. I, vol. 44, no. 7, pp. 583–590, Jul. 1997.
References 325
[12] T. Hinamoto, A. Doi and W.-S. Lu “Minimization of weighted pole and

zero sensitivity for state-space digital filters,” IEEE Trans. Circuits Syst. I,
vol. 63, no. 1, pp. 103–113, Jan. 2016.
Wesley Publishing Company, Inc., 1987.
[14] J. E. Dennis and J. J. More, “Quasi-Newton methods, motivation and
theory,” SIAM Rev., vol. 19, no. 1, pp. 46–89, 1977.
[15] R. Fletcher, Practical Methods of optimization, 2nd ed. New York: Wiley,
1987.
[16] T. Kailath, Linear System. Englewood Cliffs, NJ: Prentice-Hall, 1980.
[17] H. Togawa, Handbook of Numerical Methods, Tokyo, Japan, Saience-sha,
1992.
no. 4, pp. 273–281, Aug. 1977.
14
Error Spectrum Shaping
14.1 Preview
Error feedback is also called error spectrum shaping, and it is known as an
effective method for the reduction of quantization error generated in finite-
word-length (FWL) implementations of IIR digital filters, and it is especially
so when dealing with fixed-point implementations of narrow-band lowpass
filters. Error feedback can be achieved by extracting the quantization error after
multiplication and addition, and then feeding the error signal back through
simple filters. When error feedback is applied to an IIR digital filter with
either external or internal description, it only affects the transfer function of
the quantization error signal, but not the input-output characteristic of the filter.
As a result, error feedback neither alters coefficient sensitivities nor enhances
overflow properties of the filter. It is well known that the level of the filter’s
output quantization noise of an IIR filter tends to become high when the poles
lie close to the unit circle. This problem can also be addressed effectively by
error feedback whose parameters are chosen appropriately so that the zeros of
the transfer function from the quantization error to the filter’s output are tuned
via error spectrum shaping so as to reduce the effects of the quantization noise
at the filter’s output.
In this chapter, we study the problem of minimizing the roundoff noise
at the filter’s output by applying high-order error feedback for both external
and internal descriptions of IIR digital filters. The optimal solution of high-
order error feedback is obtained for IIR digital filters as well as state-space
digital filters. As alternatives to the optimal solution, suboptimal solutions for
the high-order error feedback with symmetric or antisymmetric coefficients
(matrices) are then derived. Finally, we present numerical experiments to
demonstrate the validity and effectiveness of the techniques addressed in this
chapter.
327
328 Error Spectrum Shaping
14.2 IIR Digital Filters with High-Order Error Feedback

14.2.1 N th-Order Optimal Error Feedback
The error feedback is implemented by modifying the quantizer in the filter
structure. In a fixed-point implementation, quantization is usually performed
by discarding the lower bits of the double-precision accumulator (two’s
complement truncation). Hence, the quantization error is equal to the residue
left in the lower part. Figure 14.1 depicts a quantizer with N th-order error
feedback where the error is fed back through a simple FIR filter. Referring to
Figure 14.1, we obtain
x(k) = u(k) + β1 e(k − 1) + β2 e(k − 2) + · · · + βN e(k − N )
(14.1)
e(k) = x̃(k) − x(k)
Substituting the first equation in (14.1) into the second one yields
x̃(k) = u(k) + e(k) + β1 e(k − 1) + β2 e(k − 2) + · · · + βN e(k − N ) (14.2)
By taking the z-transform on the both sides of (14.2), we have

X̃(z) = U (z) + 1 + β1 z −1 + β2 z −2 + · · · + βN z −N E(z) (14.3)
e(k-N)
N z -1
e(k-2)
2 z -1
e(k-1)
1 z -1
e(k)
u(k) Q[ .] ~x(k) G(z) y(k)

x(k)
Figure 14.1 A quantizer with N th-order error feedback.
14.2 IIR Digital Filters with High-Order Error Feedback 329
where X̃(z), U (z) and E(z) are the z-transforms of signals x̃(k), u(k) and
e(k), respectively. Let the transfer function from the quantization point to
the filter’s output be denoted by G(z). In general, G(z) is a rational transfer
function of a linear, time-invariant, causal, and stable system of order usually
higher than N . Under the circumstance, we obtain
Y (z) = G(z)X̃(z) = G(z)U (z) + G(z)B(z)E(z) (14.4)
where
B(z) = 1 + β1 z −1 + β2 z −2 + · · · + βN z −N
and Y (z) indicates the z-transform of the output signal y(k). It is standard to
assume that each quantizer is modeled as an independent additive white noise
source with variance σ 2 = 2−2b /12 where (1 + b) is the wordlength (1 bit for
sign). The normalized noise gain (noise variance) from the noise source to the
filter’s output can be written as
2
σout 1 dz
I(β) = 2 = G(z)B(z)G(z −1 )B(z −1 ) (14.5)
σ 2πj |z|=1 z
π
1
I(β) = |B(ejω )|2 Q(ω)dω (14.6)
π 0
where T
β = β1 β2 · · · βN , Q(ω) = |G(ejω )|2
Note that
N
N
N

B(z −1 )B(z) = 1 + βi z i + z −i + βi βl z i−l
i=1 i=1 l=1
N
N N −1 N −l

=1+ βi z i + z −i + βi2 + βi βi+l z l + z −l
i=1 i=1 l=1 i=1
(14.7)
which can be written as
N
N
N
jω 2
|B(e )| = 1 + 2 βi cos(iω) + βi βl cos(i − l)ω (14.8)
i=1 i=1 l=1
because
ejωl + e−jωl = 2 cos(lω) = cos(lω) + cos(−lω)
By referring to (14.8) and defining

1 π
qi = Q(ω) cos(iω)dω (14.9)
π 0
the normalized noise gain in (14.6) can be expressed as
N
N N

I(β) = βi βl q|i−l| + 2 βi qi + q0
i=1 l=1 i=1 (14.10)
= β T Rβ + 2β T p + q0
where
⎡ ⎤ ⎡ ⎤
q0 q1 ··· qN −1 q1
⎢ q1 q0 ··· qN −2 ⎥ ⎢ q2 ⎥
⎢ ⎥ ⎢ ⎥
R=⎢ .. .. .. .. ⎥, p=⎢ .. ⎥
⎣ . . . . ⎦ ⎣ . ⎦
qN −1 qN −2 · · · q0 qN
The matrix R is recognized as the N × N autocorrelation matrix of the output

error, which is a symmetric, positive-definite Toeplitz matrix. The vector p
is the crosscorrelation vector between the input and output error. To find the
optimal β that minimizes the normalized noise gain I(β), we use (14.10)
to compute the gradient of I(β) with respect to β and set it to null, which
leads to
∂I(β)
= 2Rβ + 2p = 0 (14.11)
∂β
Therefore, the optimal β is found to be
β opt = −R−1 p (14.12)
14.2.2 Computation of Autocorrelation Coefficients

For the present problem, the autocorrelation coefficients qi ’s depend only on
the given rational transfer function G(z) and can thus be determined exactly.
The z-domain version of (14.9) gives the autocorrelation coefficients via the
inverse z transform as

1 dz
qi = z i G(z)G(z −1 ) (14.13)
2πj |z|=1 z
Since the autocorrelation sequence is symmetric, i.e., qi = q−i for any

integer i, (14.13) is as well given in the form

1 z i + z −i dz
qi = G(z)G(z −1 ) (14.14)
2πj |z|=1 2 z
By denoting the impulse response of G(z) by {gk | k = 0, 1, 2, · · · } and
utilizing Cauchy’s integral theorem

1 k dz 1, k=0
z = (14.15)
2πj C z 0, k = 0
where C is a counterclockwise contour that encircles the origin, (14.14) can
be written as
∞ ∞
1 z i + z −i dz
qi = gk z −k gl z l
2πj |z|=1 2 z
k=0 l=0
∞ ∞
1 1 dz
= gk gl z l+i−k
2 2πj |z|=1 z
∞
k=0 l=0
∞
(14.16)
1
l−i−k dz
+ gk gl z
2πj |z|=1 z
k=0 l=0
∞ ∞
∞
1
= gk gk−i + gk gk+i = gk gk+i
2
k=0 k=0 k=0
The above equation leads to

∞

qi = g0 gi + cAk−1 bcAk+i−1 b
k=1
∞
(14.17)
= g0 gi + c Ak−1 bbT (Ak−1 )T (Ai )T cT
k=1
= g0 gi + cK c (Ai )T cT
where
∞

G(z) = d + c(zI n − A)−1 b = gk z −k
k=0
gk = cAk−1 b for k ≥ 1, g0 = d
and K c is the controllability Grammian of the state-space realization

(A, b, c, d)n of G(z) that can be obtained by solving the Lyapunov equation
K c = AK c AT + bbT
Similarly, the autocorrelation coefficients can also be derived from
qi = g0 gi + bT W o Ai b (14.18)
instead of (14.17) where W o is the observability Grammian of the state-
space realization (A, b, c, d)n of G(z) which can be obtained by solving the
Lyapunov equation
14.2.3 Error Feedback with Symmetric or

Antisymmetric Coefficients
In practice, the implementation of N th-order optimal error feedback is often
too costly because of the N explicit multiplications required. One way for
reducing the number of multiplications is to constrain B(z) to be symmetric
or antisymmetric. This halves the number of required multiplications. The
symmetry constrains the zeroes of the filter to be exactly on the unit circle in
most cases.
A. Odd-Order Error Feedback with Symmetric Coefficients

Suppose the order of an error feedback filter is odd, say N = 2M + 1, and
the coefficients are symmetric, then the error transfer function B(z) in (14.4)
can be written as
M

−(2M +1)

B(z) = 1 + z + βi z −i + z −(2M +1)+i (14.19)
i=1
The polynomial in (14.19) leads to

B(z −1 )B(z) = 2 + z 2M +1 + z −(2M +1)
M

+2 βi z i + z −i + z (2M +1)−i + z −(2M +1)+i
i=1
(14.20)
M
M

+ βi βl z i−l + z −(i−l) + z 2M +1−(i+l)
i=1 l=1
+ z −(2M +1)+i+l
M

|B(ejω )|2 = 2 1 + cos(2M + 1)ω + 2 βi {cos iω + cos(2M + 1 − i)ω}
i=1
M
M
+ βi βl {cos(i − l)ω + cos(2M + 1 − i − l)ω}
i=1 l=1
(14.21)
M

I(β o ) = 2 q0 + q2M +1 + 2 βi qi + q2M +1−i
i=1
M
M
(14.22)
+ βi βl qi−l + q2M +1−i−l
i=1 l=1

= 2 q0 + q2M +1 + 2β To po + p̃o + β To Ro + R̃o β o
where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
β1 q1 q0 q1 · · · qM −1
⎢ β2 ⎥ ⎢ q2 ⎥ ⎢ q1 q0 · · · qM −2 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
β o = ⎢ .. ⎥ , po = ⎢ .. ⎥ , Ro = ⎢ .. .. .. .. ⎥
⎣ . ⎦ ⎣ . ⎦ ⎣ . . . . ⎦
βM qM qM −1 qM −2 ··· q0
⎡ ⎤ ⎡ ⎤
q2M q2M −1 q2M −2 · · · qM
⎢q2M −1 ⎥ ⎢q2M −2 q2M −3 · · · qM −1 ⎥
⎢ ⎥ ⎢ ⎥
p̃o = ⎢ .. ⎥ , R̃o = ⎢ .. .. . .. ⎥
⎣ . ⎦ ⎣ . . . . . ⎦
qM +1 qM qM −1 · · · q1
The optimal β o is found by setting ∂I(β o )/∂β o = 0 as

−1
β opt
o = − Ro + R̃o po + p̃o (14.23)
B. Odd-Order Error Feedback with Antisymmetric Coefficients

Suppose the order of an error feedback filter is odd, say N = 2M + 1, and the
coefficients are antisymmetric, then the error transfer function B(z) in (14.4)
can be written as
M

−(2M +1)

B(z) = 1 − z + βi z −i − z −(2M +1)+i (14.24)
i=1
The polynomial in (14.24) leads to

B(z −1 )B(z) = 2 − z 2M +1 + z −(2M +1)
M

+2 βi z i + z −i − z (2M +1)−i − z −(2M +1)+i
i=1
M
M

+ βi βl z i−l + z −(i−l) − z 2M +1−(i+l)
i=1 l=1

− z −(2M +1)+i+l
(14.25)
M

|B(ejω )|2 = 2 1 − cos(2M + 1)ω + 2 βi {cos iω − cos(2M + 1 − i)ω}
i=1
M
M
+ βi βl {cos(i − l)ω − cos(2M + 1 − i − l)ω}
i=1 l=1
(14.26)
M

I(β o ) = 2 q0 − q2M +1 + 2 βi qi − q2M +1−i
i=1
M
M
(14.27)
+ βi βl qi−l − q2M +1−i−l
i=1 l=1

= 2 q0 − q2M +1 + 2β To po − p̃o + β To Ro − R̃o β o
where β o , po , p̃o , Ro and R̃o are defined as in (14.22). The optimal β o is

found by setting ∂I(β o )/∂β o = 0 as
−1
β opt
o = − Ro − R̃o po − p̃o (14.28)
C. Even-Order Error Feedback with Symmetric Coefficients

Suppose the order of an error feedback filter is even, say N = 2L, and the
coefficients are symmetric, then the error transfer function B(z) in (14.4) can
be written as
L−1

B(z) = 1 + βi z −i + z −2L+i + βL z −L + z −2L (14.29)
i=1
The above polynomial leads to

B(z −1 )B(z) = 2 + z 2L + z −2L + 2βL z L + z −L
L−1

+2 βi z i + z −i + z 2L−i + z −2L+i
i=1
L−1
L−1
+ βi βl z i−l + z −(i−l) + z 2L−i−l + z −2L+i+l
i=1 l=1
L−1

+2 βL βi z L−i + z −L+i + βL2
i=1
(14.30)
|B(ejω )|2 = 2 + 2 cos 2Lω + 4βL cos Lω

L−1

+4 βi {cos iω + cos(2L − i)ω}
i=1
L−1
L−1
+2 βi βl {cos(i − l)ω + cos(2L − i − l)ω}
i=1 l=1
L−1

+4 βL βi cos(L − i)ω + βL2
i=1
(14.31)
L−1

I(β se ) = 2 q0 + q2L + 4βL qL + 4 βi qi + q2L−i
i=1
L−1
L−1 L−1

+2 βi βl qi−l + q2L−i−l + 4 βL βi qL−i + βL2 qo
i=1 l=1 i=1

T 2 pe + p̃e T 2(Re + R̃e ) 2r
= 2 q0 + q2L + 2β se + β se β se
2qL 2r T q0
(14.32)
where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
β1 q1 q2L−1 qL−1
⎢ β2 ⎥ ⎢ q2 ⎥ ⎢q2L−2 ⎥ ⎢qL−2 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
β se = ⎢ .. ⎥ , pe = ⎢ .. ⎥ , p̃e = ⎢ .. ⎥ , r = ⎢ .. ⎥
⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦
βL qL−1 qL+1 q1
⎡ ⎤ ⎡ ⎤
q0 q1 · · · qL−2 q2L−2 q2L−3 · · · qL
⎢ q1 q0 · · · qL−3 ⎥ ⎢q2L−3 q2L−4 · · · qL−1 ⎥
⎢ ⎥ ⎢ ⎥
Re = ⎢ .. .. .. . ⎥ , R̃ e = ⎢ .. .. .. .. ⎥
⎣ . . . .. ⎦ ⎣ . . . . ⎦
qL−2 qL−3 · · · q0 qL qL−1 · · · q2
The optimal β se is found by setting ∂I(β se )/∂β se = 0 as
−1
opt 2(R e + R̃ e ) 2r 2 pe + p̃e
β se = − (14.33)
2r T q0 2qL
D. Even-Order Error Feedback with Antisymmetric Coefficients

coefficients are antisymmetric, then the error transfer function B(z) in (14.4)
can be written as
L−1

B(z) = 1 + βi z −i − z −2L+i − z −2L , βL = 0 (14.34)
i=1
The above polynomial leads to

B(z −1 )B(z) = 2 − z 2L + z −2L
L−1

+2 βi z i + z −i − z 2L−i + z −2L+i
i=1 (14.35)
L−1
L−1
+ βi βl z i−l + z −(i−l) − z 2L−i−l + z −2L+i+l
i=1 l=1
L−1

|B(ejω )|2 = 2 − 2 cos 2Lω + 4 βi cos iω − cos(2L − i)ω
i=1
(14.36)
L−1
L−1
+2 βi βl cos(i − l)ω − cos(2L − i − l)ω
i=1 l=1

L−1

I(β ae ) = 2 q0 − q2L + 4 βi qi − q2L−i
i=1
L−1
L−1 (14.37)
+2 βi βl qi−l − q2L−i−l
i=1 l=1

= 2 q0 − q2L + 4β Tae pe − p̃e + 2β Tae Re − R̃e β ae
where pe , p̃e , Re and R̃e are defined as in (14.32), and

T
β ae = β1 β2 · · · βL−1
The optimal β ae is found by setting ∂I(β ae )/∂β ae = 0 as
−1
β opt
ae = − Re − R̃e pe − p̃e (14.38)
Table 14.1 Suboptimal symmetric and antisymmetric error feedback coefficients

Order Symmetric B(z) Antisymmetric B(z)
−1
N =1 1+z 1 − z −1
1 + β1 z −1 + z −2
N =2 −2q1 1 − z −2
β1 =
q0
1 + β1 z −1 + β1 z −2 + z −3 1 + β1 z −1 − β1 z −2 − z −3
N =3 −(q1 + q2 ) −(q1 − q2 )
β1 = β1 =
q 0 + q1 q 0 − q1
1 + β1 z −1 + β2 z −2 + β1 z −3 + z −4
2q1 q2 − q0 (q1 + q3 ) 1 + β1 z −1 − β1 z −3 − z −4
N =4 β1 =
q0 (q0 + q2 ) − 2q12 −(q1 − q3 )
2q1 (q1 + q3 ) − 2q2 (q0 + q2 ) β1 =
β2 = q 0 − q2
q0 (q0 + q2 ) − 2q12
Symmetric and antisymmetric solutions of the order 1 to 4 are summarized

in Table 14.1. From the table, it is observed that the first-order solutions have
no free parameters but they possess a fixed real zero at z = ±1, thus being
suitable for narrow-band, lowpass, or highpass filters when only moderate
noise reduction is required. The second- to fourth-order solutions contain at
most 2 free parameters which control the locations of the complex-conjugate
zeros, thus more capable of efficient noise reduction.
14.3 State-Space Filter with High-Order Error Feedback

14.3.1 N th-Order Optimal Error Feedback
Consider a stable, controllable and observable nth-order state-space digital
filter (A, b, c, d)n described by
x(k + 1) = Ax(k) + bu(k)

(14.39)

a scalar output, and A, b, c and d are real constant matrices of appropriate
dimensions. By taking quantization performed before matrix-vector multi-
plication into account, a finite-word-length implementation of (14.39) with
high-order error feedback can be obtained as
N

x̃(k + 1) = AQ[x̃(k)] + bu(k) + F i e(k − i + 1)
(14.40)
i=1
ỹ(k) = cQ[x̃(k)] + du(k)
where F 1 , F 2 , · · · , F N are referred to as n × n high-order error feedback

matrices and
e(k) = Q[x̃(k)] − x̃(k)
A block diagram illustrating a state-space model with high-order error
feedback is shown in Figure 14.2.
The coefficient matrices A, b, c, and d in (14.40) are assumed to have
exact fractional Bc -bit representations. The FWL state-variable vector x̃(k)
and each output ỹ(k) has B-bit fractional representations, while the input u(k)
is a (B − Bc )-bit fraction. The quantizer Q[ · ] in (14.40) rounds the B-bit
fraction x̃(k) to (B − Bc )-bit after the multiplications and additions, where
14.3 State-Space Filter with High-Order Error Feedback 339
FN e(k-N+1)
z -1In
FN-1 e(k-N+2)
F2 e(k-1)
z -1In
F1
e(k)
~y(k)
u(k) b z-1In ~x(k) Q[.] c
Figure 14.2 A state-space model with high-order error feedback.
the sign bit is not counted. It is assumed that the roundoff error e(k) can be
modeled as a Gaussian random process with zero mean and covariance σ 2 I n .
By subtracting (14.39) from (14.40), we obtain
N

Δx(k + 1) = AΔx(k) + Ae(k) + F i e(k − i + 1)
(14.41)
i=1
Δy(k) = cΔx(k) + ce(k)
where
Δx(k) = x̃(k) − x(k), Δy(k) = ỹ(k) − y(k)
By taking the z-transform on both sides of (14.41), we have
N

z [ΔX(z) − Δx(0)] = AΔX(z) + AE(z) + F i z −i+1 E(z)
i=1
ΔY (z) = cΔX(z) + cE(z)
(14.42)
where ΔX(z), ΔY (z) and E(z) represent the z-transforms of Δx(k), Δy(k)
and e(k), respectively. By setting Δx(0) = 0, it follows from (14.42) that
ΔY (z) = H e (z)E(z)
N

H e (z) = c(zI n − A)−1 A+ F i z −i+1 + c
i=1
N
(14.43)

−i+1
= c(zI n − A)−1 A+ F iz + zI n − A
i=1
= z G(z)B(z)
where
N

G(z) = c(zI n − A)−1 , B(z) = I n + F i z −i
i=1
We now define the normalized noise gain J(F ) = σout 2 /σ 2 with F =
[F T1 , F T2 , · · · , F TN ]T in terms of the transfer function H e (z) = zG(z)

B(z) as

1 dz
J(F ) = tr B H (z)GH (z)G(z)B(z) (14.44)
2πj |z|=1 z
where B H (z) and GH (z) denote the conjugate transpose of B(z) and G(z),
respectively. The problem being considered here is to obtain the error feedback
matrices F 1 , F 2 , · · · , F N which minimize the normalized noise gain J(F )
in (14.44).
By substituting G(z) and B(z) in (14.43) into (14.44), we obtain

1 N N
T i −l dz
J(F ) = tr In + F i z Q(z) I n + F lz
2πj |z|=1 z
i=1 l=1

dz 1 dz
N
1
= tr Q(z) + (z i + z −i )Q(z) Fi
2πj |z|=1 z 2πj |z|=1 z
i=1

N N 1 dz
T i−l
+ Fi z Q(z) Fl
2πj |z|=1 z
i=1 l=1
(14.45)
where
Q(z) = GH (z)G(z)
By defining

1 z i + z −i dz
Qi = Q(z) for i = 0, 1, · · · , N (14.46)
2πj |z|=1 2 z
the normalized noise gain in (14.45) can be expressed as
N N N

J(F ) = tr Q0 + 2 Qi F i + F Ti Q|i−l| F l
i=1 i=1 l=1 (14.47)

= tr Q0 + 2S T F + F T RF
where
⎡ ⎤ ⎡ ⎤
Q1 Q0 Q1 · · · QN −1
⎢Q ⎥ ⎢ Q1 Q0 · · · QN −2 ⎥
⎢ 2⎥ ⎢ ⎥
S = ⎢ . ⎥, R=⎢ .. .. .. .. ⎥
⎣ .. ⎦ ⎣ . . . . ⎦
QN QN −1 QN −2 ··· Q0
The optimal solution is found by setting ∂J(F )/∂F = 0 and solving the
equation for F = [F T1 , F T2 , · · · , F TN ]T , which gives
F opt = −R+ S (14.48)
The optimal solution obtained above minimizes (14.47) as

Jmin (F opt ) = tr Q0 − S T R+ S (14.49)
where R+ is the pseudoinverse matrix of R.
14.3.2 Computation of Qi for i = 0, 1, · · · , N − 1

By substituting
∞

G(z) = c(zI n − A)−1 = cAk z −(k+1) (14.50)
k=0
into (14.46), we obtain

∞ ∞
1 z i + z −i k T T dz
Qi = (A ) c cAl z k−l
2πj |z|=1 2 z
k=0 l=0
(14.51)
1 ∞ ∞ z k+i−l + z k−i−l dz
= (Ak )T cT cAl
2πj |z|=1 2 z
k=0 l=0
Applying the Cauchy integral theorem in (14.15) to (14.51) yields
1 k T T
∞
Qi = (A ) c cAk+i + (Ak+i )T cT cAi
2
k=0 (14.52)
1
= W o Ai + (Ai )T W o
2
where
∞

Wo = (Ak )T cT cAk
k=0
is the observability Grammian of the filter in (14.39), that can be obtained by
solving the Lyapunov equation
14.3.3 Error Feedback with Symmetric or

Antisymmetric Matrices
In order to reduce the number of multiplications, the coefficient matrices of
the error feedback filter B(z) in (14.43) is constrained to be symmetric or
antisymmetric. In this way, the number of multiplications required will be
reduced by a half.
A. Odd-Order Error Feedback with Symmetric Matrices

Suppose the error feedback filter B(z) has an odd order, say N = 2M + 1,
then a condition for the coefficient matrices of the error feedback filter B(z)
to be symmetric is given by
F i = F 2M +1−i for i = 0, 1, 2, · · · , M (14.53)
where F 0 = I n . The error feedback filter B(z) in (14.43) can then be written
as
M

B(z) = I n + F i z −i + z −(2M +1−i) + z −(2M +1) I n (14.54)
i=1
which yields

B H (z)Q(z)B(z) = 2Q(z) + z 2M +1 + z −(2M +1) Q(z)
M

+ z l + z −l + z 2M +1−l + z −(2M +1−l) Q(z)F l
l=1
M

+ z i + z −i + z 2M +1−i + z −(2M +1−i) F Ti Q(z)
i=1 (14.55)
M
M

+ z i−l + z −(i−l) + z 2M +1−i−l
i=1 l=1
+ z −(2M +1−i−l) F Ti Q(z)F l

M

J(F o ) = 2 tr Q0 + Q2M +1 + 2 Qi + Q2M +1−i F i
i=1

M
M
(14.56)
+ F Ti Q|i−l| + Q2M +1−i−l F l
i=1 l=1

= 2 tr Q0 + Q2M +1 + 2(S o + S̃ o )T F o + F To (Ro + R̃o )F o
where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
F1 Q1 Q0 Q1 · · · QM −1
⎢F ⎥ ⎢Q ⎥ ⎢ Q Q0 · · · QM −2 ⎥
⎢ 2⎥ ⎢ 2⎥ ⎢ 1 ⎥
F o = ⎢ . ⎥ , S o = ⎢ . ⎥ , Ro = ⎢ . .. .. .. ⎥
⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. . . . ⎦
FM QM QM −1 QM −2 ··· Q0
⎡ ⎤ ⎡ ⎤
Q2M Q2M −1 Q2M −2 · · · QM
⎢Q ⎥ ⎢Q · · · QM −1 ⎥
⎢ 2M −1 ⎥ ⎢ 2M −2 Q2M −3 ⎥
S̃ o = ⎢ . ⎥ , R̃o = ⎢ . .. .. .. ⎥
⎣ .. ⎦ ⎣ .. . . . ⎦
QM +1 QM QM −1 ··· Q1
The optimal solution is found by setting ∂J(F o )/∂F o = 0 and solving the
equation for F o , which gives
F opt +
o = −(Ro + R̃o ) (S o + S̃ o ) (14.57)

Jmin (F opt
o ) = 2 tr Q0 + Q2M +1 − (S o + S̃ o )T
(R o + R̃ o )+
(S o + S̃ o )
(14.58)
B. Odd-Order Error Feedback with Antisymmetric Matrices

Suppose the error feedback filter B(z) has an odd order, say N = 2M + 1,
then a condition for the coefficient matrices of the error feedback filter B(z)
to be antisymmetric is given by
F i = −F 2M +1−i for i = 0, 1, 2, · · · , M (14.59)
where F 0 = I n . The error feedback filter B(z) in (14.43) can then be

written as
M

B(z) = I n + F i z −i − z −(2M +1−i) − z −(2M +1) I n (14.60)
i=1
which yields

B H (z)Q(z)B(z) = 2Q(z) − z 2M +1 + z −(2M +1) Q(z)
M

+ z l + z −l − z 2M +1−l − z −(2M +1−l) Q(z)F l
l=1
M

+ z i + z −i − z 2M +1−i − z −(2M +1−i) F Ti Q(z)
i=1
M
M

+ z i−l + z −(i−l) − z 2M +1−i−l
i=1 l=1

− z −(2M +1−i−l) F Ti Q(z)F l
(14.61)
M

J(F o ) = 2 tr Q0 − Q2M +1 + 2 Qi − Q2M +1−i F i
i=1

M
M
+ F Ti Q|i−l| − Q2M +1−i−l F l
i=1 l=1 (14.62)

= 2 tr Q0 − Q2M +1 + 2(S o − S̃ o )T F o + F To (Ro − R̃o )F o
where F o , S o , S̃ o , Ro and R̃o are defined as in (14.56).

The optimal solution is found by setting ∂J(F o )/∂F o = 0 and solving
the equation for F o , which gives
F opt +
o = −(Ro − R̃o ) (S o − S̃ o ) (14.63)

Jmin (F opt
o ) = 2 tr Q0 − Q2M +1 − (S o − S̃ o )T
(R o − R̃ o )+
(S o − S̃ o )
(14.64)
C. Even-Order Error Feedback with Symmetric Matrices

Suppose the error feedback filter B(z) has an even order, say N = 2L, then
a condition for the coefficient matrices of the error feedback filter B(z) to be
symmetric is given by
F i = F 2L−i for i = 0, 1, 2, · · · , L − 1 (14.65)
where F 0 = I n . The error feedback filter B(z) in (14.43) can then be written
as
L−1

B(z) = I n + F i z −i + z −(2L−i) + F L z −L + z −2L I n (14.66)
i=1
which yields

B H (z)Q(z)B(z) = 2 Q(z) + z 2L + z −2L Q(z)
L−1

+ z l + z −l + z 2L−l + z −(2L−l) Q(z)F l
l=1
L−1

+ F Ti z i + z −i + z 2L−i + z −(2L−i) Q(z)
i=1

+ z L + z −L Q(z)F L + F TL z L + z −L Q(z)
L−1

+ F TL z L−l + z −(L−l) Q(z)F l
l=1
L−1

+ F Ti z L−i + z −(L−i) Q(z)F L + F TL Q(z)F L (14.67)
i=1
L−1
L−1
+ F Ti z i−l + z −(i−l) + z 2L−l−i
i=1 l=1

+ z −(2L−l−i) Q(z)F l
By substituting (14.67) into (14.44), we have
L−1

J(F se ) = tr 2 Q0 + Q2L + 4 QL F L + 4 Qi + Q2L−i F i
i=1
L−1
L−1 L−1

+2 F Ti Q|i−l| + Q2L−i−l F l + 4 F Ti QL−i F L
i=1 l=1 i=1
+ F TL Q0 F L
(14.68)
where T
F se = F T1 F T2 · · · F TL
The normalized noise gain in (14.68) can be written as

P e + P̃ e
J(F se ) = tr 2 Q0 + Q2L + 4F Tse
QL
(14.69)
T 2(Re + R̃e ) 2Γ
+ F se F se
2ΓT Q0
where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
Q1 Q2L−1 Q0 Q1 · · · QL−2
⎢ ⎥
Q2 ⎢Q ⎥ ⎢ Q Q0 · · · QL−3 ⎥
⎢ ⎥ ⎢ 2L−2 ⎥ ⎢ 1 ⎥
Pe = ⎢ ⎥ ,
.. P̃ e = ⎢ . ⎥ , R e = ⎢ . .. .. .. ⎥
⎣ ⎦ . ⎣ .. ⎦ ⎣ .. . . . ⎦
QL−1 QL+1 QL−2 QL−3 · · · Q0
⎡ ⎤ ⎡ ⎤
Q2L−2 Q2L−3 · · · QL QL−1
⎢Q ⎥ ⎢Q ⎥
⎢ 2L−3 Q2L−4 · · · QL−1 ⎥ ⎢ L−2 ⎥
R̃e = ⎢ . . .. . ⎥, Γ=⎢ . ⎥
⎣ .. .. . .. ⎦ ⎣ .. ⎦
QL QL−1 · · · Q2 Q1
The optimal solution is found by setting ∂J(F se )/∂F se = 0 and solving the
equation for F se , which gives
+
opt 2(Re + R̃e ) 2Γ P e + P̃ e
F se = −2 (14.70)
2ΓT Q0 QL

T +
P e + P̃ e 2(R e + R̃ e ) 2Γ
Jmin (F opt
se ) = tr 2 Q0 + Q2L − 4
QL 2ΓT Q0

P e + P̃ e
(14.71)
QL
D. Even-Order Error Feedback with Antisymmetric Matrices

coefficient matrices are antisymmetric, then a condition for the coefficient
matrices of the error feedback filter B(z) to be antisymmetric is given by
F i = −F 2L−i for i = 0, 1, 2, · · · , L − 1 (14.72)
where F 0 = I n and F L = 0. The error feedback filter B(z) in (14.43) can

then be written as
L−1

B(z) = I n + F i z −i − z −(2L−i) − z −2L I n (14.73)
i=1
which yields

B H (z)Q(z)B(z) = 2 Q(z) − z 2L + z −2L Q(z)
L−1

+ z l + z −l − z 2L−l − z −(2L−l) Q(z)F l
l=1
L−1

+ F Ti z i + z −i − z 2L−i − z −(2L−i) Q(z)
i=1
L−1
L−1 (14.74)
+ F Ti z i−l + z −(i−l) − z 2L−l−i
i=1 l=1

− z −(2L−l−i) Q(z)F l
By substituting (14.74) into (14.44), we have
L−1

J(F ae ) = 2 tr Q0 − Q2L + 2 Qi − Q2L−i F i
i=1
L−1
(14.75)
L−1

T
+ F i Q|i−l| − Q2L−i−l F l
i=1 l=1
where T
F ae = F T1 F T2 · · · F TL−1
The normalized noise gain in (14.75) can be written as

T T
J(F ae ) = 2 tr Q0 − Q2L + 2F ae P e − P̃ e + F ae Re − R̃e F ae
(14.76)
Table 14.2 Suboptimal symmetric and antisymmetric error feedback matrices
Order Symmetric B(z) Antisymmetric B(z)
N =1 In + z −1 In In − z −1 In
In + F1 z −1 + z −2 In
N =2 In − z −2 In
F1 = −2Q+
0 Q1
I n + F1 z −1 + F1 z −2 + z −3 I n In + F1 z −1 − F1 z −2 − z −3 I n
N =3
F1 = −(Q0 +Q1 )+ (Q1 +Q2 ) F1 = −(Q0 −Q1 )+ (Q1 −Q2 )
In + F1 z −1 + F2 z −2 + F1 z −3 + z −4 I n
⎡ ⎤ ⎡ ⎤+⎡ ⎤ I n + F1 z −1 − F1 z −3 − z −4 I n
N = 4 F1 2(Q0+Q2 ) 2Q1 Q1+Q3
⎣ ⎦= −2⎣ ⎦⎣ ⎦ F1 = −(Q0 −Q2 )+ (Q1 −Q3 )
F2 2Q1 Q0 Q2
where P e , P̃ e , Re , and R̃e are defined in (14.69). The optimal solution is

found by setting ∂J(F ae )/∂F ae = 0 and solving the equation for F ae , which
gives +
F opt
ae = − Re − R̃e P e − P̃ e (14.77)
T +
Jmin (F opt
ae ) = 2 tr Q 0 − Q2L − P e − P̃ e R e − R̃ e P e − P̃ e
(14.78)
Symmetric and antisymmetric solutions for order N = 1 to N = 4 are
summarized in Table 14.2.

14.4.1 Example 1 : An IIR Digital Filter
As an example, consider a 4th-order elliptic lowpass filter whose transfer
function is given by
G(z) =
1
(1 − 1.773152z −1 + 0.801564z −2 )(1 − 1.833400z −1 + 0.927062z −2 )
1
=
1− 3.606552z −1 + 4.979522z −2 − 3.113409z −3 + 0.743099z −4
Assuming that signal quantization is performed after the accumulation of
products, the normalized noise gain in (14.10) of this filter in direct form
implementation without error feedback was found to be 43.5068 dB. Table 14.3
summarizes the normalized noise gains of the elliptic IIR filter when optimal,
symmetric, and antisymmetric error feedbacks of order N for N = 1, 2, 3,
and 4 were applied. The parameters characterizing the error feedback loops
are also included in the table.
It is observed that increasing the order of B(z) reduces the optimal noise
gain and, as expected, with N = 4 the solution B(z) = 1/G(z) achieves
complete noise cancellation, i.e., a zero normalized noise gain. We remark
that the same solution can be approached by solving a higher order error
feedback from (14.12). This solution can also be interpreted as a double-
precision implementation of the filter [5, 6]. In the case where the noise
transfer function G(z) is not purely recursive, complete cancellation is no
longer possible. However, the solution asymptotically approaches the 0 dB
level when the order of the error feedback increases.
Table 14.3 Error feedback for a 4th-order elliptic lowpass filter

N Optimal Symmetric Antisymmetric
Noise (dB) 30.2480 49.4751 30.3002
1 β1 –0.976105 1 –1
Noise (dB) 15.4705 15.5071 36.2320
β1 –1.935827 –1.952210 0
2
β2 0.983216 1 –1
Noise (dB) 3.4891 21.4206 36.1761
β1 –2.887382 –0.952611 –2.919043
3 β2 2.856706 –0.952611 2.919043
β3 –0.967798 1 –1
Noise (dB) 0.0000 0.5971 8.9126
β1 –3.606552 –3.855180 –1.919584
β2 4.979522 5.713413 0
4 β3 –3.113409 –3.855180 1.919584
β4 0.743099 1 –1
The implementation of error feedback is often the most efficient if explicit

multiplications are not needed at all. For example, if the error-feedback
coefficients are quantized to powers of two, only additions or subtractions
with shift are needed for implementation. The results of rounding the optimal
error feedback coefficients to integers or a power-of-two representation with
3 bits after the binary point are summarized in Table 14.4. We remark that
improved results may be achieved by discrete optimization in conjunction
with dynamic programming.
14.4.2 Example 2 : A State-Space Digital Filter

Consider a state-space digital filter (Ao , bo , co , d)3 described by
⎡ ⎤ ⎡ ⎤
0 0 0.4537681 1
⎣
Ao = 1 0 −1.5561612 , ⎦ bo = 0 ⎦
⎣
0 1 1.9748611 0

co = 10−1 0.7930672 1.7963671 2.5451875
d = 1.5941494 × 10−2
By applying the coordinate transformation matrix given by
T o = diag{2.1244192, 4.9806829, 4.1306156}

Table 14.4 Powers-of-two error feedback for a 4th-order elliptic lowpass filter
Infinite Integer 3-Bit
N
Precision Quantization Quantization
Noise (dB) 15.4705 19.3827 22.2841
β1 –1.935827 –2 –1.875
2 β2 0.983216 1 1.000
Noise (dB) 3.4891 9.6814 6.3832
β1 –2.887382 –3 –2.875
β2 2.856706 3 2.875
3
β3 –0.967798 –1 –1.000
Noise (dB) 0.0000 30.2714 1.9722
β1 –3.606552 –4 –3.625
β2 4.979522 5 5.000
4 β3 –3.113409 –3 –3.125
β4 0.743099 1 0.750
to the original filter (Ao , bo , co , d)3 , a new realization specified by A =

T −1 −1
o Ao T o , b = T o bo , c = co T o and d was constructed as
⎡ ⎤ ⎡ ⎤
0 0 0.8822843 0.4707169
A = ⎣ 0.4265317 0 −1.2905667 ⎦ , b=⎣ 0 ⎦
0 1.2057968 1.9748611 0

c = 0.1684807 0.8947135 1.0513191
d = 1.5941494 × 10−2
The controllability Grammian K c and the observability Grammian W o were
then computed by solving the Lyapunov equations K c = AK c AT + bbT
and W o = AT W o A + cT c as
⎡ ⎤
1.000000 −0.848957 0.769793
K c = ⎣ −0.848957 1.000000 −0.914218 ⎦
0.769793 −0.914218 1.000000
⎡ ⎤
1.042736 2.182632 1.257200
W o = ⎣ 2.182632 5.575521 3.950707 ⎦
1.257200 3.950707 3.284172
respectively, and the normalized noise gain of the filter (A, b, c, d)3 without
error feedback was found from (14.47) to be
J(0) = tr[Q0 ] = tr[W o ] = 9.902430 (9.957418 dB)
A. Optimal Error Feedback

As an example, we consider the problem of applying high-order error feedback
to the filter (A, b, c, d)3 , and seek to find the optimal solutions which utilize
error feedback with N = 1, 2, and 3 (partially 4).
(1) Case N = 1 : The optimal error feedback matrix was obtained using
(14.48) as
⎡ ⎤
−1.480125 −0.318225 1.163136
F 1 = ⎣ 0.499455 −0.784328 −1.242740 ⎦
−0.379979 0.076911 0.289592
and the normalized noise gain of the filter was computed from (14.47) as
J(F ) = 2.968791 (4.725796 dB)
(2) Case N = 2 : The optimal error feedback matrices were derived from
(14.48) as
⎡ ⎤
−2.581439 1.096233 4.790081
F 1 = ⎣ 0.758043 −2.656162 −4.057192 ⎦
−0.561834 1.074577 1.862577
⎡ ⎤
0.217390 −3.681772 −5.053548
F 2 = ⎣ 0.607375 3.863745 3.919715 ⎦
−0.504042 −2.412000 −2.240139
and the normalized noise gain of the filter was calculated from (14.47) as
J(F ) = 1.259031 (1.000363 dB)
(3) Case N = 3 : The optimal error feedback matrices were found from
(14.48) to be
⎡ ⎤
−2.352663 0.319526 3.478115
F 1 = ⎣ 0.555381 −3.777495 −5.153576 ⎦
−0.440912 2.064015 2.916668
⎡ ⎤
−0.0893259 1.394268 1.741796
F 2 = ⎣ 0.8827037 2.564305 1.885559 ⎦
−0.6555847 −1.520423 −0.900386
⎡ ⎤
0.281060 −0.463009 −0.998642
F 3 = ⎣ −0.241377 −1.353617 −1.328390 ⎦
0.142676 1.119958 1.184455
and the normalized noise gain of the filter was found from (14.47) to be
J(F ) = 1.212530 (0.836924 dB)
B. Suboptimal Symmetric or Antisymmetric Error Feedback

Now we present suboptimal symmetric or antisymmetric solutions which
employ error feedback with N = 2, 3, and 4.
(1) Case N = 2 : The suboptimal symmetric error feedback matrix was
obtained using (14.70) (or Table 14.2) as
⎡ ⎤
−2.960249 −0.636449 2.326272
F 1 = ⎣ 0.998910 −1.568656 −2.485479 ⎦
−0.759957 0.153823 0.579183
and the normalized noise gain of the filter was computed from (14.69) as
J(F 1 ) = 2.082088 (3.184990 dB)
(2) Case N = 3 : The suboptimal symmetric and antisymmetric error

feedback matrices were derived from (14.57) and (14.63) (or Table 14.2) as
⎡ ⎤
−2.364050 −2.585538 −0.263467
F 1 = ⎣ 1.365418 1.207583 −0.137477 ⎦ : symmetric
−1.065876 −1.337423 −0.377562
⎡ ⎤
−2.798829 4.778005 9.843629
F 1 = ⎣ 0.150669 −6.519907 −7.976907 ⎦ : antisymmetric
−0.057792 3.486577 4.102716
and the normalized noise gains of the filter were calculated from (14.56) and
(14.62) as
J(F 1 ) = 2.815986 (4.496304 dB) : symmetric

J(F 1 ) = 2.220137 (3.463798 dB) : antisymmetric
(3) Case N = 4 : The suboptimal symmetric and antisymmetric error

feedback matrices were found from (14.70) and (14.77) (or Table 14.2) to be
⎡ ⎤
−2.218375 −0.165585 2.641467
F 1 = ⎣ 0.320466 −5.130139 −6.489098 ⎦ : symmetric
−0.280215 3.186687 4.081233
⎡ ⎤
0.293638 2.859654 2.962318
F 2 = ⎣ 1.563867 5.098261 3.993561 ⎦ : symmetric
−1.188198 −3.022329 −1.936498
⎡ ⎤
−2.633723 0.782534 4.476757
F 1 = ⎣ 0.796758 −2.423879 −3.825186 ⎦ : antisymmetric
−0.583588 0.944057 1.732213
respectively, and from (14.69) and (14.76), the normalized noise gains of the
filter were found to be
J(F ) = 2.082088 (3.184990 dB) : symmetric
J(F 1 ) = 2.768031 (4.421709 dB) : antisymmetric
respectively, where F = [F T1 , F T2 ]T .
When N = 1, 2, 3, and 4, optimal, symmetric, and antisymmetric error
feedbacks were applied to the state-space digital filter in Example 2, the results
obtained are summarized in Table 14.5.
From the table, it is observed that an increase of N in B(z) tends to reduce
the normalized noise gain in optimal solutions, as expected, while it does
not always reduce the normalized noise gain in symmetric or antisymmetric
solutions.
The results of rounding the optimal and suboptimal error feedback coeffi-
cients to integers or a power-of-two representation with 3 bits after the binary
point are summarized in Table 14.6.
Table 14.5 Error feedback noise gain (dB) for a 4th-order state-space lowpass filter
N Optimal Symmetric Antisymmetric
1 4.7258 (dB) 15.5855 (dB) 5.3438 (dB)
2 1.0004 (dB) 3.1850 (dB) 9.9092 (dB)
3 0.8369 (dB) 4.4963 (dB) 3.4638 (dB)
4 0.5852 (dB) 3.1850 (dB) 4.4217 (dB)
References 355
Table 14.6 Powers-of-two error feedback noise gain (dB) obtained by rounding optimal and
suboptimal solutions
Infinite Integer 3-Bit
N
Precision Quantization Quantization
1 4.7258 (dB) 5.0011 (dB) 4.7690 (dB)
2 1.0004 (dB) 5.6358 (dB) 1.3982 (dB)
Optimal
3 0.8369 (dB) 8.3475 (dB) 1.5807 (dB)
2 3.1850 (dB) 8.7398 (dB) 3.2645 (dB)
3 4.4963 (dB) 9.9118 (dB) 4.5565 (dB)
Symmetric
4 3.1850 (dB) 9.5234 (dB) 3.4388 (dB)
3 3.4638 (dB) 5.5044 (dB) 3.4718 (dB)
Antisymmetric 4 4.4217 (dB) 6.8862 (dB) 3.4388 (dB)
14.5 Summary
The optimal solution of general high-order error feedback has been presented
for both external and state-space descriptions of IIR digital filters. As alter-
natives for efficient implementations, suboptimal schemes with symmetric or
antisymmetric coefficients has been examined. In addition, numerical experi-
ments have been presented to demonstrate the validity and effectiveness of
the present techniques, where the error feedback quantizer with power-of-two
coefficients has been considered.
References
[1] T. I. Laakso and I. O. Hartimo, “Noise reduction in recursive digital filters
using high-order error feedback,” IEEE Trans. Signal Process., vol. 40,
no. 5, pp. 1096–1107, May 1992.
[2] T. Hinamoto and S. Karino “Noise reduction in state-space digital filters
using high-order error feedback,” IEICE Trans. Part A, vol. J77-A, no. 9,
pp. 1214–1222, Sep. 1994.
[3] T. Hinamoto and S. Karino “High-order error feedback for noise reduction
in state-space digital filters,” in Proc. Int. Conf. Acoust., Speech, Signal
Process. (ICASSP’94), May, 1994, vol. 3, pp. 1387–1390.
[4] P. P. Vaidyanathan, “On error-spectrum shaping in state-space digital
filters,” IEEE Trans. Circuits Syst., vol. CAS-32, no. 1, pp. 88–92, Jan.
1985.
[5] W. E. Higgins and D. C. Munson, “Noise reduction strategies for digital

filters: Error spectrum shaping versus the optimal linear state-space for-
mulation,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-30,
no. 6, pp. 963–973, Dec. 1982.
[6] C. T. Mullis and R. A. Roberts, “An interpretation of error spectrum
shaping in digital filters,” IEEE Trans. Acoust., Speech, Signal Process.,
vol. ASSP-30, no. 6, pp. 1013–1015, Dec. 1982.
15
Roundoff Noise Analysis
and Minimization
15.1 Preview
In the implementation of IIR digital filters with fixed-point arithmetic, it is
of critical significance to reduce the effects of roundoff noise at the filter’s
output. An approach is to synthesize the optimal state-space filter structure for
the roundoff noise gain to be minimized by applying a linear transformation
to state-space coordinates subject to l2 -scaling constraints [1–4]. As another
approach, error feedback is found effective for reducing finite-word-length
(FWL) effects in IIR digital filters, and many error feedback methods have
been proposed in the past [5–14]. Alternatively, the roundoff noise can also
be reduced by introducing delta operators to IIR digital filters [15, 16], or
by adopting a new structure based on the concept of polynomial operators for
digital filter implementation [17]. As a natural extension of the aforementioned
methods, novel methods which combine state-space realization and error
feedback have been developed for achieving better performance [18–20].
Separately and jointly optimized scalar or general error feedback matrix
for state-space filters have been explored [18]. A jointly-optimized iterative
algorithm with a general, diagonal, or scalar error feedback matrix using a
quasi-Newton method has been developed for state-space digital filters [19]. In
addition, the use of high-order error feedback and its effect on noise-reduction
performance for state-space digital filters have been investigated [20].
In the first half of this chapter, following the work by Mullis-Roberts and
Hwang [1–3], a method for synthesizing the optimal internal structure that
minimizes the roundoff noise subject to l2 -scaling constraints is presented.
Unlike the method by Mullis-Roberts and Hwang, however, the present
method relaxes the l2 -scaling constraints into a single constraint on matrix
trace and solves the relaxed problem with an effective closed-form matrix
solution.
357
358 Roundoff Noise Analysis and Minimization
In the second half of this chapter, a joint optimization technique of high-

order error feedback and state-space realization for minimizing the roundoff
noise subject to l2 -scaling constraints is introduced [20]. The objective
function is minimized by employing a quasi-Newton algorithm.
Numerical experiments are included to illustrate the validity and effec-
tiveness of these algorithms and demonstrate their performance.
15.2 Filters Quantized after Multiplications

15.2.1 Roundoff Noise Analysis and Problem Formulation
Consider a stable, controllable and observable state-space digital filter
(A, b, c, d)n described by
x(k + 1) = Ax(k) + bu(k)
(15.1)
a scalar output, and A, b, c, and d are n × n, n × 1, 1 × n, and 1 × 1 real
constant matrices, respectively. A block diagram of a state-space digital filter
in (15.1) is shown in Figure 15.1.
Due to product quantization, the actual filter implemented by a FWL
machine is
x̃(k + 1) = Ax̃(k) + bu(k) + α(k) + β(k)
(15.2)
ỹ(k) = cx̃(k) + du(k) + γ(k) + δ(k)
where x̃(k) is the actual state-variable vector, ỹ(k) is the actual output, and
α(k), β(k), γ(k), and δ(k) are n × 1, n × 1, 1 × 1, and 1 × 1 error vectors
x(k+1) x(k)
A
Figure 15.1 Block diagram of a state-space digital filter.
15.2 Filters Quantized after Multiplications 359
x(k+1) x(k)
u(k) b z-1In c y(k)
Figure 15.2 Block diagram of an actual state-space digital filter with several noise sources.
generated due to product quantization associated with the A, b, c, and d

matrices, respectively. A block diagram of an actual state-space digital filter
with several noise sources in (15.2) is illustrated in Figure 15.2.
Subtracting (15.1) from (15.2), we obtain
Δx(k + 1) = AΔx(k) + α(k) + β(k)

(15.3)
Δy(k) = cΔx(k) + γ(k) + δ(k)
where Δx(k) = x̃(k) − x(k) is the state error vector and Δy(k) = ỹ(k) −
y(k) is the output noise. A block diagram of a state-space model for noise
propagation in (15.3) is depicted in Figure 15.3.
Assuming that Δx(0) = 0 in (15.3), we have
k−1

Δy(k) = c Ak−l−1 α(l) + β(l) + γ(k) + δ(k) (15.4)
l=0
z-1In c
A
Figure 15.3 Block diagram of a state-space model for noise propagation.
One way for measuring the noise component in the above state-space
model is to estimate its average power or variance. Under the usual assump-
tion that the product quantization errors are white noises being statistically
independent from source to source, and from time to time, the expected square
error is k−1
E02
E[Δy(k)2 ] = cAl Q(cAl )T + μ + ν (15.5)
12
l=0
where Q is a diagonal matrix whose ith diagonal element qi is the number of

coefficients in the ith rows of A and b that are neither 0 nor ±1, E02 /12 is the
variance of each noise source, and μ and ν are the number of neither 0 nor ±1
constants in c and d, respectively.
For stable digital filters with distinct natural frequencies, the sequence in
(15.5) is shown to converge, and the variance of the output noise Δy(k) is
given by
M

2 1 2
E[Δy ] = lim E Δy(k)
M →∞ M +1
k=0
∞

E02
l l T
= cA Q(cA ) + μ + ν (15.6)
12
l=0

E02
= tr[QW o ] + μ + ν
12
where W o is the observability Grammian of the filter in (15.1), that can be

obtained by solving the Lyapunov equation
W o = AT W o A + cT c (15.7)
It should be noted that the l2 -scaling constraints on the state-variable vector

x(k) involve the controllability Grammian K c of the filter in (15.1), which
can be computed by solving the Lyapunov equation
K c = AK c AT + bbT (15.8)
Applying a coordinate transformation for the state-variable vector
x(k) = T −1 x(k) (15.9)

to the filter in (15.1), we obtain a new realization (A, b, c, d)n described by
x(k + 1) = Ax(k) + bu(k)

(15.10)
y(k) = c x(k) + du(k)
where
A = T −1 AT , b = T −1 b, c = cT
A block diagram illustrating an equivalent state-space digital filter is shown
in Figure 15.4.
Accordingly, the controllability and observability Grammians relating to
(A, b, c, d)n can be expressed as
K c = T −1 K c T −T , W o = T T W oT (15.11)
respectively. In this case, (15.6) can be written as

E02
E[Δy 2 ] = tr[Q W o ] + μ + ν (15.12)
12
where Q is a diagonal matrix whose ith diagonal element q i is the number

of coefficients in the ith rows of A and b that are neither 0 nor ±1, μ is the
number of nonzero-or-unity elements in c. Also, the l2 -scaling constraints are
imposed on the state-variable vector x(k) so that
eTi K c ei = 1 for i = 1, 2, · · · , n (15.13)
The problem of roundoff noise minimization is now formulated as follows.

Given an arbitrary initial realization (A, b, c, d)n with associated K c and
W o matrices, find an n × n nonsingular transformation matrix T , and a new
d
-1
-1 -1
T x(k)
u(k) T b z In cT y(k)
-1
T AT
Figure 15.4 Block diagram of an equivalent state-space digital filter.
realization (A, b, c, d)n such that tr[Q T T W o T ] is minimized subject to the

l2 -scaling constraints in (15.13).
However, it turns out that the diagonal matrix Q is extremely difficult to
be explicitly expressed as a function of A, b and an arbitrary T .
Hence, under the most pessimistic assumption that Q = (n + 1)I n and
μ = n, i.e., the coefficients of A, b, and c are neither 0 nor ±1, below we shall
investigate the problem of minimizing tr[T T W o T ] subject to the l2 -scaling
constraints in (15.13).
15.2.2 Roundoff Noise Minimization Subject to

First, we develop an analytical method for minimizing tr[T T W o T ] with
respect to a nonsingular T matrix subject to the l2 -scaling constraints in
(15.13). To this end, we define the Lagrange function
J(P , λ) = tr[W o P ] + λ (tr[K c P −1 ] − n) (15.14)
where P = T T T and λ is a Lagrange multiplier. The optimal coordinate
transformation matrix T can be determined by solving the equations
∂J(P , λ)
= W o − λP −1 K c P −1 = 0
∂P
(15.15)
∂J(P , λ)
= tr[K c P −1 ] − n = 0
∂λ
which lead to
P W o P = λK c , tr[K c P −1 ] = n (15.16)
√ − 12
1 1 1 − 12
2
P = λWo W o2 K c W o2 Wo
n (15.17)
1 1 1
√ tr[K c W o ] 2 = √ θi = n
λ λ i=1
where θi2 for i = 1, 2, · · · , n denote the eigenvalues of K c W o . Hence

n
1 −1
1 1 1
−1
2
P = θi W o 2 W o2 K c W o2 W o 2 (15.18)
n
i=1
Substituting (15.18) into (15.14) yields the minimum value of J(P , λ) as

n
1 2
min J(P , λ) = θi (15.19)
P ,λ n
i=1
Referring to (15.18), the optimal coordinate transformation matrix T that

minimizes (15.14) can now be obtained in closed form as follows:
1
n 1
− 12
1 1 1
2 4
T =√ θi Wo W o2 K c W o2 Uo (15.20)
n
i=1
where U o is an arbitrary n × n orthogonal matrix. From (15.20), it follows

that
n
−1 1 1 1
U To W o2 K c W o2 U o
2
K c = T −1 K c T −T = n θi (15.21)
i=1
Next, we choose the n × n orthogonal matrix U o such that matrix K in

(15.21) satisfies the l2 -scaling constraints in (15.13). To this end, we perform
the eigenvalue-eigenvector decomposition
1 1 1
= Q diag{θ1 , θ2 , · · · , θn }QT
2
W o K cW o
2 2
(15.22)
where QQT = I n . As a result, we can write
−1 1
n
1 1
= QΛ−2 QT
2
n θi W o2 K c W o2 (15.23)
i=1
where
Λ = diag{λ1 , λ2 , · · · , λn }
θ1 + θ2 + · · · + θn 12
λi = for i = 1, 2, · · · , n
nθi
Now, an n × n orthogonal matrix Z such that
⎡ ⎤
1 ∗ ··· ∗
⎢ ∗ 1 . . . ... ⎥
ZΛ−2 Z T = ⎢ ⎣ .. . . . .
⎥
⎦ (15.24)
. . . ∗
∗ ··· ∗ 1
can be obtained by numerical manipulations [3, p. 278]. By choosing U o =

QZ T in (15.20), the optimal coordinate transformation matrix T satisfying
(15.13) and (15.19) can be constructed as
1
n 1
− 12
1 1 1
QZ T
2 4
T =√ θi Wo W o2 K c W o2 (15.25)
n
i=1
By substituting (15.25) into (15.10), we can construct the optimal real-

ization (A, b, c, d)n which minimizes tr[T T W o T ] subject to l2 -scaling
constraints in (15.13).
15.3 Filters Quantized before Multiplications

15.3.1 State-Space Model with High-Order Error Feedback
Again we consider a stable, controllable and observable state-space digital
filter (A, b, c, d)n of order n described by (15.1). When the quantization is
performed before matrix-vector multiplication, an actual state-space digital
filter (A, b, c, d)n can be expressed as
x̃(k + 1) = AQ[x̃(k)] + bu(k)

(15.26)
ỹ(k) = cQ[x̃(k)] + du(k)
where
e(k) = x̃(k) − Q[x̃(k)]
A block diagram of an actual state-space digital filter in (15.26) is shown
in Figure 15.5. By taking the quantization performed before matrix-vector
multiplication into account, an FWL implementation of (15.1) with error
feedforward and high-order error feedback can be obtained as
u(k) y(k)
b z -1In Q[ ] c
x(k)
d
Figure 15.5 Block diagram of an actual state-space digital filter.
15.3 Filters Quantized before Multiplications 365
N
−1
x̃(k + 1) = AQ[x̃(k)] + bu(k) + D i e(k − i)
i=0
(15.27)
ỹ(k) = cQ[x̃(k)] + du(k) + he(k)
where h and D 0 , D 1 , · · · , D N −1 are referred to as a 1 × n error-feedforward

vector and n × n high-order error feedback matrices, respectively. A block
diagram illustrating a state-space model with N th-order error feedback and
an error feedforward path is shown in Figure 15.6.
The coefficient matrices A, b, c, and d in (15.27) are assumed to have
exact fractional Bc -bit representations. The FWL state-variable vector x̃(k)
and each output ỹ(k) has B-bit fractional representations, while the input
u(k) is a (B − Bc )-bit fraction. The quantizer Q[·] in (15.27) rounds the B-bit
DN-1 e(k-N+1)
z -1In
DN-2 e(k-N+2)
D1 e(k-1)
z -1In
D0 h
e(k)
u(k) b z-1In Q[.] c ~y(k)

~x(k)
d
Figure 15.6 State-space model with N th-order error feedback and error feedforward.
fraction x̃(k) to (B − Bc )-bit after the multiplications and additions, where

the sign bit is not counted. It is assumed that the roundoff error e(k) can be
modeled as a Gaussian random process with zero mean and covariance σ 2 I n .
By subtracting (15.27) from (15.1), we obtain
N
−1
Δx(k + 1) = AΔx(k) + Ae(k) − D i e(k − i)
i=0 (15.28)
Δy(k) = cΔx(k) + (c − h)e(k)
where
Δx(k) = x(k) − x̃(k), Δy(k) = y(k) − ỹ(k)
By taking the z-transform on both sides of (15.28), we have
N
−1
z[ΔX(z) − Δx(0)] = AΔX(z) + AE(z) − D i z −i E(z)
i=0
ΔY (z) = cΔX(z) + (c − h)E(z)

(15.29)
where ΔX(z), ΔY (z) and E(z) represent the z-transforms of Δx(k), Δy(k)
and e(k), respectively. Setting Δx(0) = 0 leads (15.29) to
ΔY (z) = H e (z)E(z)
N
−1 (15.30)
H e (z) = c(zI n − A)−1 A− D i z −i + c − h
i=0
15.3.2 Formula for Noise Gain

Based on the model developed above, we now define the normalized noise
gain Je1 (h, D) = σout2 /σ 2 with h and D = [D , D , · · · , D
0 1 N −1 ] in terms
of the transfer function H e (z) as
1 dz
Je1 (h, D) = tr HH
e (z)H e (z) (15.31)
2πj |z|=1 z
In order to derive an easy-to-use formula to evaluate and minimize the noise

gain, we write H e (z) in (15.30) as
∞
N
−1
H e (z) = c Ak−1 z −k A − D i z −i + c − h
k=1 i=0
∞ ∞ N −1

=c Ak z −k − Al−1 D i z −(l+i) + c − h (15.32)
k=1 l=1 i=0
∞
N
−1
= c Ak − Ak−i−1 D i z −k + c − h
k=1 i=0
where Ai = 0 for i < 0. By substituting (15.32) into (15.31) and making use
of the Cauchy integral theorem

1 dz 1, k=0
zk = (15.33)
2πj C z 0, k = 0
where C is a counterclockwise contour that encircles the origin, we obtain
N
−1
Je1 (h, D) = tr W o − (AT )i+1 W o D i + D Ti W o Ai+1
i=0
N
−1 N
−1
+ D Ti (AT )j−i W o + W o Ai−j D j
i=0 j=0
N
−1
− D Ti W o D i − 2hT c + hT h
i=0
(15.34)
where W o is the observability Grammian of the filter in (15.1), that can be
obtained by solving the Lyapunov equation in (15.7).
It is useful to note that if the high-order error feedback matrices
D 0 , D 1 , · · · , D N −1 are diagonal, then the formula for the noise gain can
be considerably simplified to
N
−1
Je1 (h, D) = tr W o − cT c − 2 W o Ai+1 D i
i=0
N
−1 N
−1
+ W o A|i−j| D i D j + (c − h)(c − h)T
i=0 j=0
(15.35)
It should also be noted that the l2 -scaling constraints on the state-variable

vector x(k) involve the controllability Grammian K c of the filter in (15.1),
which can be computed by solving the Lyapunov equation in (15.8).

A different yet equivalent state-space description of (15.1), (A, b, c, d)n , can
be obtained via a coordinate transformation in (15.9) as shown in (15.10).
We now choose the error feedforward vector as h = c to eliminate the
last term in (15.35). With an equivalent state-space realization as specified in
(15.10) and assuming the use of high-order diagonal error feedback matrices,
the normalized noise gain is then found to be
N
−1
T
Je2 (D, T )= tr T (W o −cT c)T −2 T T W o Ai+1 T D i
i=0
(15.36)
N
−1 N
−1
+ T T W o A|i−j| T D i D j
i=0 j=0
where the noise gain is denoted as Je2 (D, T ) to reflect the fact that the
noise gain is now dependent on both high-order error feedback matrices
D 0 , D 1 , · · · , D N −1 as well as state-space coordinate transformation T . For-
mula (15.36) provides an analytic foundation for the minimization of the noise
gain by jointly optimize the error feedback loop and state-space coordinate
transformation. Formally, the problem being considered here can be stated as
to jointly deign the high-order error feedback matrices D 0 , D 1 , · · · , D N −1
and coordinate transformation matrix T to minimize the noise gain Je2 (D, T )
in (15.36) subject to the l2 -scaling constraints in (15.13), assuming the error
feedforward vector h = c is chosen so as to eliminate the last term in (15.35).
15.3.4 Joint Optimization of Error Feedback and Realization

15.3.4.1 The Use of Quasi-Newton Algorithm
In what follows, it is assumed that the high-order error feedback matrices
are all diagonal so that (15.36) is a valid objective function to be minimized.
Because the constraints in (15.13) are nonconvex and highly nonlinear with
respect to matrix T , it is beneficial if these constraints can be eliminated
so as to work with an unconstrained problem that admits fast Newton-like
algorithms. To this end, we define
− 12
T̂ = T T K c (15.37)
which leads (15.13) to
−T −1
(T̂ T̂ )ii = 1 for i = 1, 2, · · · , n (15.38)
−1
These constraints are always satisfied if matrix T̂ assumes the form

−1 t1 t2 tn
T̂ = , ,··· , (15.39)
||t1 || ||t2 || ||tn ||
Substituting (15.37) into (15.38), we obtain
T N
−1
T
Je3 (D, T̂ ) = tr T̂ V 0 − ĉT ĉ T̂ − 2 T̂ V p+1 T̂ D p
p=0
(15.40)
N
−1 N
−1
T

+ T̂ V |p−q| T̂ D p D q
p=0 q=0
where 1 1 1
V p = K c2 W o Ap K c2 , ĉ = cK c2
Following the foregoing arguments, the problem of obtaining T and
D 0 , D 1 , · · · , D N −1 that jointly minimize (15.36) subject to the l2 -scaling
constraints in (15.13) is now converted into an unconstrained optimization
problem of obtaining T̂ and D 0 , D 1 , · · · , D N −1 that jointly minimize
Je3 (D, T̂ ) in (15.40).
Let x be the column vector that collects the variables in [t1 , t2 , · · · , tn ] and
D 0 , D 1 , · · · , D N −1 . Then Je3 (D, T̂ ) in (15.40) is a function of x, denoted
by J(x). The proposed algorithm starts with an initial point x0 obtained from
the assignment T̂ = D 0 = D 1 = · · · = D N −1 = I n . In the kth iteration,
a quasi-Newton algorithm updates the most recent point xk to point xk+1 as
[21]
xk+1 = xk + αk dk (15.41)
where

dk = −S k ∇J(xk ), αk = arg min J(xk + αdk )
α
T
γ Tk S k γ k δ k δ Tk − δ k γ Tk S k +S k γ k δ k ,
S k+1 = S k + 1+ S0 = I
γ Tk δ k γ k δk
T γ k δk
T
δ k = xk+1 − xk , γ k = ∇J(xk+1 ) − ∇J(xk )

∇J(x) denotes the gradient of J(x) with respect to x, and S k is a positive-

definite approximation of the inverse Hessian matrix of J(x).
The iteration process (15.41) continues until
|J(xk+1 ) − J(xk )| < ε (15.42)
at step k, the xk is viewed as a solution point.
15.3.4.2 Gradient of J(x)

The implementation efficiency and solution accuracy of the quasi-Newton
algorithm greatly depends on how the gradient ∇J(x) is evaluated. With
high-order diagonal error feedback matrices, we derive closed-form formulas
for computing the partial derivatives of J(x) with respect to the elements of T
as well as those of D = [D 0 , D 1 , · · · , D N −1 ]. The closed-form expressions
for the gradient of J(x) can be derived below.
Each term of the objective function in (15.40) has the form J(x) =
T
tr[T̂ V T̂ ] which, in the light of (15.39), can be expressed as

t1 t2 tn −1 t1 t2 tn −T
J(x) = tr , ,··· , V , ,··· ,
||t1 || ||t2 || ||tn || ||t1 || ||t2 || ||tn ||
(15.43)
To compute ∂J(x)/∂tij , we perturb the ith component of vector tj by a small
amount, say Δ, and keep the rest of T̂ unchanged. If we denote the perturbed
−1
jth column of T̂ by t̃j /||t̃j ||, then a linear approximation of t̃j /||t̃j || can
be obtained as
t̃j tj t tj
j
+ Δ∂ /∂tij = − Δ g ij (15.44)
||t̃j || ||tj || ||tj || ||tj ||
where t
j 1
g ij = −∂ /∂tij = (tij tj − ||tj ||2 ei )
||tj || ||tj ||3
Now let T̂ ij be the matrix obtained from T̂ with a perturbed (i, j)th
component, then we obtain
−1 −1
T̂ ij = T̂ − Δg ij eTj (15.45)
and up to the first-order, the matrix inversion formula [22, p. 655] gives
ΔT̂ g ij eTj T̂
T̂ ij = T̂ + T̂ + ΔT̂ g ij eTj T̂ (15.46)
1− ΔeTj T̂ g ij
For convenience, we define T̂ ij = T̂ + ΔS with S = T̂ g ij eTj T̂ and write

T T
T̂ ij V T̂ ij = T̂ + ΔS V T̂ + ΔS
(15.47)
T T
= T̂ V T̂ + ΔSV T̂ + ΔT̂ V S T + Δ2 SV S T
which implies that
T T T
tr T̂ ij V T̂ ij − tr T̂ V T̂ Δ tr S V + V T T̂ (15.48)
provided that Δ is sufficiently small. Hence
T T T
∂tr T̂ V T̂ tr T̂ ij V T̂ ij − tr T̂ V T̂
= lim
∂tij Δ→0 Δ (15.49)
T
= tr S(V + V T )T̂
By substituting S = T̂ g ij eTj T̂ into (15.49), we obtain
T
∂tr T̂ V T̂ T
= tr T̂ g ij eTj T̂ V + V T T̂
∂tij (15.50)
T
= eTj T̂ V + V T T̂ T̂ g ij
On comparing (15.40) with (15.50), the components of the gradient of J(x)
with respect to t1 , t2 , · · · , tn are found to be
∂J(x) T N−1
T
= 2eTj T̂ V 0 − ĉT ĉ T̂ − T̂ V p+1 + V Tp+1 T̂ D p
∂tij
p=0
1
N
−1 N
−1
T
+ T̂ V |p−q| +V T|p−q| T̂ D p D q T̂ g ij
2
p=0 q=0
(15.51)
for i, j = 1, 2, · · · , n.
Let the high-order error feedback matrices assume the form
D p = diag{dp1 , dp2 , · · · , dpn } for p = 0, 1, · · · , N − 1. (15.52)
The gradients of J(x) with respect to the diagonal D p are given by
N −1
∂J(x) T T
= −2 T̂ V p+1 T̂ ii + 2 dqi T̂ V |p−q| T̂ ii (15.53)
∂dpi
q=0
for p = 0, 1, · · · , N − 1 and i = 1, 2, · · · , n.
15.3.5 Analytical Method for Separate Optimization

Here the term “separate optimization” refers to a procedure where the opti-
mization of transformation matrix T and that of the error feedback matrices
{D 0 , D 1 , · · · , D N −1 } are carried out separately as two different steps. First,
we fix the error feedback matrices to D i = 0 for i = 0, 1, · · · , N − 1 in
(15.27) so that the objective function in (15.36) is reduced to

Je2 (0, T ) = tr T T (W o − cT c)T (15.54)
which is minimized with respect to matrix T subject to the l2 -scaling
constraints in (15.13). Second, with T optimized in the first step, (15.36)
is minimized under the fixed T with respect to matrices D 0 , D 1 , · · · , D N −1 .
To perform the first step, we define the Lagrange function

Jo (P , λ) = tr XP + λ tr[KP −1 ] − n (15.55)
where P = T T T and X = W o − cT c. By applying the same manner as in

Section 15.2.2, we arrive at
1
n 1
1
1 1
1 4
X − 2 X 2 K c X 2 QZ T
2
T =√ σi (15.56)
n
i=1
where σi2 for i = 1, 2, · · · , n denote the eigenvalues of K c X. This is the

coordinate transformation matrix T which minimizes (15.54) subject to the
l2 -scaling constraints in (15.13).
In the second step, suppose that D p for p = 0, 1, · · · , N − 1 are diagonal
matrices. In this case, matrix D p assumes the form
D p = diag{αp1 , αp2 , · · · , αpn } for p = 0, 1, . . . , N − 1 (15.57)
It follows that
N
−1
∂Je2 (D, T )
= −2 T T W o Ap+1 T +2 αqi T T W o A|p−q| T =0
∂αpi ii ii
q=0
(15.58)
for i = 1, 2, · · · , n. As a result, matrix D p can be derived from
⎡ ⎤ ⎡ ⎤
α0i ri (1, 0)
⎢ α1i ⎥ ⎢ ri (2, 0) ⎥
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ = R−1i ⎢ .. ⎥ (15.59)
⎣ . ⎦ ⎣ . ⎦
αN −1,i ri (N, 0)
where
⎡ ⎤
ri (0, 0) ri (0, 1) ··· ri (0, N − 1)
⎢ ri (1, 0) ri (1, 1) ··· ri (1, N − 1) ⎥
⎢ ⎥
Ri = ⎢ .. .. .. .. ⎥
⎣ . . . . ⎦
ri (N − 1, 0) ri (N − 1, 1) · · · ri (N − 1, N − 1)

with ri (p, q) = T T W o A|p−q| T ii for i = 1, 2, · · · , n.

15.4.1 Filter Description and Initial Roundoff Noise
We consider a stable 3rd-order lowpass state-space digital filter (A, b, c, d)3
described by
⎡ ⎤ ⎡ ⎤
0 1 0 0
A= ⎣ 0 0 1 ⎦ , b= 0 ⎦⎣
0.339377 −1.152652 1.520167 1

c = 0.093253 0.128620 0.314713 , d = 0.065959
The controllability and observability Grammians K c and W o were computed
from (15.7) and (15.8) as
⎡ ⎤
5.215397 3.869762 1.184455
K c = ⎣ 3.869762 5.215397 3.869762 ⎦
1.184455 3.869762 5.215397
⎡ ⎤
0.138134 −0.313522 0.336218
W o = ⎣ −0.313522 0.872712 −0.804183 ⎦
0.336218 −0.804183 1.123823
The eigenvalues of K c W o were as
θ12 = 2.499998, θ22 = 0.049748, θ13 = 0.729367
When a coordinate transformation defined by
T o = diag{ 2.283724, 2.283724, 2.283724 }
was applied to the above filter (A, b, c, d)3 , its controllability and observ-
ability Grammians K oc = T −1 o K cT o
−T
and W oo = T To W o T o were
derived as
⎡ ⎤
1.000000 0.741988 0.227107
K oc = ⎣ 0.741988 1.000000 0.741988 ⎦
0.227107 0.741988 1.000000
⎡ ⎤
0.720426 −1.635144 1.753511
W oo = ⎣ −1.635144 4.551538 −4.194133 ⎦
1.753511 −4.194133 5.861185
The original noise gain subject to l2 -scaling constraints was then computed
from (15.14) as
J(P , 0) = tr[T To W o T o ] = tr[W oo ] = 11.133150
where P = T o T To .
15.4.2 The Use of Analytical Method in Section 15.2.2

The optimal symmetric and positive-definite matrix P which minimizes
(15.14) were obtained from (15.18) as
⎡ ⎤
12.205669 4.936938 −0.320033
P = ⎣ 4.936938 4.593693 2.247419 ⎦
−0.320033 2.247419 3.190823
and the minimum value of (15.14) was found to be
min J(P , λ) = 2.355360

P ,λ
As a result, the optimal coordinate transformation matrix T was computed
from (15.25) as
⎡ ⎤
−1.961239 2.849950 0.486823
T = ⎣ 0.382807 2.067061 −0.417624 ⎦
1.519389 0.874919 0.341756
The optimal realization that minimizes the roundoff noise tr[T T W o T ] subject
to l2 -scaling constraints in (15.13) was then constructed from (15.10) as
⎡ ⎤ ⎡ ⎤
0.542471 −0.340493 0.565793 0.388814
A = ⎣ 0.565793 0.488848 0.159318 ⎦, b = ⎣ 0.111998 ⎦
−0.340493 0.012490 0.488848 0.910741

c = 0.344517 0.806980 0.099238
In this case, the controllability and observability Grammians K c =

T −1 K c T −T and W o = T T W o T became
⎡ ⎤
1.000000 0.541747 0.541747
K c = ⎣ 0.541747 1.000000 0.036160 ⎦
0.541747 0.036160 1.000000
⎡ ⎤
0.785120 0.425337 0.425337
W o = ⎣ 0.425337 0.785120 0.028390 ⎦
0.425337 0.028390 0.785120
and
tr[W o ] = 2.355360
which coincides with the minimum value of J(P , λ) in (15.14).
15.4.3 The Use of Iterative Method in Section 15.3.4

In what follows, the simulation was carried out for the state-space model
specified by (T −1 −1
o AT o , T o b, cT o , d)3 . Simulation results are shown for
two system set-up’s—the first employed a static error feedback (i.e. N = 1)
while the second used a dynamic error feedback with N = 2. In both cases
the error feedback matrices involved were all diagonal.
In the case of N = 1, the quasi-Newton algorithm in (15.41) was applied
to minimize (15.40) with tolerance ε = 10−8 in (15.42). It took the algorithm
64 iterations to converge to the solution
⎡ ⎤
5.899503 −0.516441 1.228943
T̂ = ⎣ 5.935833 −0.493021 2.377971 ⎦
4.442394 −1.380516 1.478971
D 0 = diag{ 0.390401 0.574984 0.659618 }
The minimized noise gain was found to be
Je3 (D, T̂ ) = 0.273955
The profile of Je3 (D, T̂ ) during the first 64 iterations of the algorithm is
depicted in Figure 15.7. The values of function J(x) at the initial point and
seven subsequent iterates were found to be

J(x0 ) J(x1 ) J(x2 ) J(x3 ) 1.2156 0.7980 0.6502 0.5258
=
J(x4 ) J(x5 ) J(x6 ) J(x7 ) 0.4529 0.4132 0.3847 0.3564
Figure 15.7 Profile of Je3 (D, T̂ ) during the first 64 iterations.
The coordinate transformation matrix T was then computed from (15.37) as

⎡ ⎤
5.119779 5.187942 3.438106
T = ⎣ 2.677953 3.209672 1.472176 ⎦
1.013689 2.060072 0.832834
which yields an equivalent realization in (15.10) with

⎡ ⎤ ⎡ ⎤
0.561391 0.228938 −0.345628 −0.704730
A = ⎣ −0.410493 0.581361 0.374739 ⎦, b = ⎣ 0.346362 ⎦
0.562335 −0.284606 0.377416 0.526789

c = 2.605487 3.528241 1.763191
From (15.11), it follows that the corresponding controllability and observabi-

lity Grammians K c and W o became
⎡ ⎤
1.000000 −0.692295 −0.317993
K c = ⎣ −0.692295 1.000000 −0.447851 ⎦
−0.317993 −0.447851 1.000000
⎡ ⎤
8.140416 11.841925 6.169088
W o = ⎣ 11.841925 18.715668 9.945036 ⎦
6.169088 9.945036 5.650477
In the case of N = 2, the quasi-Newton algorithm in (15.41) was applied to
minimize (15.40) with tolerance ε = 10−8 in (15.42). It took the algorithm
36 iterations to converge to the solution
⎡ ⎤
1.367924 −0.014714 0.386079
T̂ = ⎣ −0.849858 1.032268 0.308031 ⎦
0.078696 0.047338 1.218360
D 0 = diag{ 0.478043 0.923417 1.187428 }
D 1 = diag{ −0.060347 −0.622454 −0.584187 }
The minimized noise gain in this case was found to be
Je3 (D, T̂ ) = 0.031801
which is approximately nine times smaller than what the best static error-
feedback can achieve. The profile of Je3 (D, T̂ ) during the first 36 iterations
of the algorithm is depicted in Figure 15.8. The values of function J(x) at the
initial point and seven subsequent iterates were found to be

J(x0 ) J(x1 ) J(x2 ) J(x3 ) 7.7554 2.5386 1.3188 0.4822
=
J(x4 ) J(x5 ) J(x6 ) J(x7 ) 0.3056 0.1281 0.0721 0.0575
The coordinate transformation matrix T was then computed from (15.37)
as ⎡ ⎤
1.234828 −0.312148 0.118222
T = ⎣ 0.747764 0.581564 0.598990 ⎦
0.371646 0.705802 1.120151
which yields an equivalent realization in (15.10) with
⎡ ⎤ ⎡ ⎤
0.588282 0.709731 0.690452 −0.219057
A = ⎣ −0.081529 0.771527 0.873168 ⎦, b = ⎣ −0.557861 ⎦
−0.034786 −0.456772 0.160358 0.815097

c = 0.749725 0.611620 1.006192
Figure 15.8 Profile of Je3 (D, T̂ ) during the first 36 iterations.
From (15.11), it follows that the corresponding controllability and observabil-

ity Grammians K c and W o became
⎡ ⎤
1.000000 0.666020 −0.546135
K c = ⎣ 0.666020 1.000000 −0.496903 ⎦
−0.546135 −0.496903 1.000000
⎡ ⎤
0.711711 0.651290 1.285624
W o = ⎣ 0.651290 0.907285 1.413957 ⎦
1.285624 1.413957 3.602021
The above and several other simulation results regarding the noise gain
Je3 (D, T̂ ) in (15.40) plus (ĉ − h)(ĉ − h)T were summarized in Table 15.1
where the column with “Infinite Precision” shows the value of Je3 (D, T̂ )
derived from the optimal T̂ and D. The column with “3-Bit Quantization”
means that of Je3 (D, T̂ ) + (ĉ − h)(ĉ − h)T where each entry of the optimal
15.5 Summary 379
Table 15.1 Performance comparison

Infinite 3-Bit
N Optimization Precision Quantization
Separate 0.644452 0.659679
1
Joint 0.273955 0.295692
Separate 0.353326 0.373479
2
Joint 0.031801 0.046057
D was rounded to a power-of-two representation with 3 bits after the binary

point, matrix T̂ was updated so as to minimize Je3 (D, T̂ ) with such a D fixed,
and each entry of ĉ derived from the updated T̂ was rounded to a power-of-two
representation with 3 bits after the binary point to set a new vector h.
The proposed joint optimization strategy was evaluated in comparison with
the separate optimization technique shown in Section 15.3.5. The separate
optimization technique finds an optimal coordinate transformation T first in
absence of error feedback. With T fixed, it then finds an optimal error-feedback
matrix D as well as a feedforward vector h. Our simulations considered two
cases, namely the case where the system parameters were implemented with
infinite precision and the case where the system parameters were implemented
using 3-bit quantization. Table 15.1 includes numerical results obtained using
the separate optimization and joint optimization, respectively, where each
optimization procedure was applied with static (N = 1) and dynamic (N = 2)
error-feedback. From Table 15.1, it is observed that (i) the use of a dynamic
error-feedback (i.e. with N > 1) led to considerable improvement in roundoff
noise reduction relative to static error-feedback (i.e. with N = 1) for both
separate and joint optimization; (ii) in each of the four scenarios, joint
optimization outperforms its separate-optimization counterpart in a significant
manner; and (iii) for both infinite precision and 3-bit quantization cases, the
best performance was achieved when the joint optimization technique was
employed in conjunction with a dynamic error-feedback.
15.5 Summary
For state-space digital filters, two techniques for minimizing the roundoff
noise subject to l2 -scaling constraints have been presented. One has relied
on the lines studied by Mullis-Roberts and Hwang with the relaxation of l2 -
scaling constraints into a single constraint on matrix trace, and the optimal
matrix solution has been analytically obtained in closed form. The other has
been a joint optimization technique of high-order error feedback and state-

space realization for minimizing the effects of roundoff noise at the filter
output subject to l2 -scaling constraints, and an efficient quasi-Newton algo-
rithm has been employed to minimize the objective function iteratively. The
simulation results in numerical experiments have demonstrated the validity
and effectiveness of the present techniques.
References
[1] S. Y. Hwang, “Roundoff noise in state-space digital filtering: A general
analysis,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-24,
no. 3, pp. 256–262, June 1976.
fixed point digital filters,” IEEE Trans. Circuits Syst., vol. CAS-23, no. 9,
pp. 551–562, Sept. 1976.
no. 4, pp. 273–281, Aug. 1977.
[4] L. B. Jackson, A. G. Lindgren and Y. Kim, “Optimal synthesis of second-
order state-space structures for digital filters,” IEEE Trans. Circuits Syst.,
vol. CAS-26, no. 3, pp. 149–153, Mar. 1979.
[5] H. A. Spang, III and P. M. Shultheiss, “Reduction of quantizing noise
by use of feedback,” IRE Trans. Commun. Syst., vol. CS-10, no. 4,
pp. 373–380, Dec. 1962.
[6] T. Thong and B. Liu, “Error spectrum shaping in narrowband recursive
digital filters,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-
25, no. 2, pp. 200–203, Apr. 1977.
[7] T.-L. Chang and S.A. White, “An error cancellation digital-filter structure
and its distributed-arithmetic implementation,” IEEE Trans. Circuits
Syst., vol. CAS-28, no. 4, pp. 339–342, Apr. 1981.
[8] D. C. Munson and D. Liu, “Narrow-band recursive filters with error
spectrum shaping,” IEEE Trans. Circuits Syst., vol. CAS-28, no. 2,
pp. 160–163, Feb. 1981.
[9] W. E. Higgins and D. C. Munson, “Noise reduction strategies for
digital filters: Error spectrum shaping versus the optimal linear state-
space formulation,” IEEE Trans. Acoust., Speech, Signal Process., vol.
ASSP-30, no. 6, pp. 963–973, Dec. 1982.
References 381
[10] M. Renfors, “Roundoff noise in error-feedback state-space filters,” in

Pro (c. Int. Conf. Acoust., Speech, Signal Process. (ICASSP’83), Apr.
1983, pp. 619–622.
[11] W. E. Higgins and D. C. Munson, “Optimal and suboptimal error
spectrum shaping for cascade-form digital filters,” IEEE Trans. Circuits
Syst., vol. CAS-31, no. 5, pp. 429–437, May 1984.
[12] T. I. Laakso and I. O. Hartimo, “Noise reduction in recursive digital filters
using high-order error feedback,” IEEE Trans. Signal Process., vol. 40,
no. 5, pp. 1096–1107, May 1992.
[13] P. P. Vaidyanathan, “On error-spectrum shaping in state-space digital
filters,” IEEE Trans. Circuits Syst., vol. CAS-32, no. 1, pp. 88–92, Jan.
1985.
[14] D. Williamson, “Roundoff noise minimization and pole-zero sensitivity
in fixed-point digital filters using residue feedback,” IEEE Trans. Acoust.,
Speech, Signal Process., vol. ASSP-34, no. 5, pp. 1210–1220, Oct. 1986.
[15] D. Williamson, “Delay replacement in direct form structures”, IEEE
Trans. Acoust., Speech, Signal Process., vol. ASSP-36, no. 4, pp.
453–460, Apr. 1988.
[16] G. Li and M. Gevers, “Roundoff noise minimization using delta-operator
realizations,” IEEE Trans. Signal Process., vol. 41, no. 2, pp. 629–637,
Feb. 1993.
[17] G. Li and Z. Zhao, “On the generalized DFIIt structure and its state-space
realization in digital filter implementation,” IEEE Trans. Circuits Syst. I,
vol. 51, no. 4, pp. 769–778, Apr. 2004.
[18] T. Hinamoto, H. Ohnishi and W.-S. Lu, “Roundoff noise minimization
of state-space digital filters using separate and joint error feedback/
coordinate transformation,” IEEE Trans. Circuits Syst. I, vol. 50, no. 1,
pp. 23–33, Jan. 2003.
[19] W.-S. Lu and T. Hinamoto, “Jointly optimized error-feedback and reali-
zation for roundoff noise minimization in state-space digital filters,”
IEEE Trans. Signal Process., vol. 53, no. 6, pp. 2135–2145, June 2005.
[20] T. Hinamoto, A. Doi and W.-S. Lu, “Jointly optimal high-order error-
feedback and realization for roundoff noise minimization in 1-D and
2-D state-space digital filters,” IEEE Trans. Signal Process., vol. 61, no.
23, pp. 5893–5904, Dec. 1, 2013.
[21] R. Fletcher, Practical Methods of Optimization, 2nd ed. New York, NY:
Wiley, 1987.
[22] T. Kailath, Linear Systems, Engle Cliffs, NJ: Prentice Hall, 1980.
16
Generalized Transposed Direct-Form II
Realization
16.1 Preview
Delta operator is often applied to the implementation of IIR digital filters due
to its good numerical properties subject to fast sampling. When the sampling
rate is much higher than the underlying signal bandwidth, then finite-word-
length (FWL) effects become worse. If the poles of a narrow-band lowpass
filter cluster near the point z = 1 in the z plane, the filter becomes very
noisy and sensitive to coefficient quantization. For example, notch filters
have pole(s) close to the point z = 1 if the rejected signal component exists
at the normalized frequency near zero. To achieve good finite-word-length
performance subject to fast sampling, the forward shift operator z is replaced
by the delta operator, defined by
z−1
δ(z) =
Δ
where Δ is originally the sampling interval [1]. Later on, the parameter Δ is
often used to improve the finite-word-length effects as a free parameter by
several researchers. An implementation of δ −1 (z) is shown in Figure 16.1.
It is known that delta operator realizations generally yield better roundoff
noise performance and more robust coefficient sensitivity [10]. For example, a
high-order transfer function may be implemented as a series of second-order
Figure 16.1 Implementation of δ −1 (z).
383
384 Generalized Transposed Direct-Form II Realization
sections that are connected in cascade with each section implemented by a

direct form in delta operator. The roundoff noise of such an implementation
was analyzed in [2] where the transposed direct-form II is found to yield
the lowest roundoff noise gain at the output among all the direct forms. A
modified delta transposed direct-form II second-order section where the Δs
and filter coefficients at different branches are separately scaled to achieve
improved roundoff noise gain minimization was proposed [3]. Alternatively,
an nth-order delta transposed direct-form II IIR filter with minimum roundoff
noise gain and sensitivity has been derived by utilizing different coupling
coefficients at different branch nodes for better noise gain suppression [4].
Moreover, a set of special operators was employed to obtain the generalized
transposed direct-form II structure of a general order, say pth-order, IIR digital
filter and its equivalent state-space realization subject to l2 -scaling where p free
parameters in the operators are appropriately chosen to minimize the roundoff
noise gain or the coefficient sensitivity measure of each structure [5].
This chapter is written mainly along the line of the paper in [5],
but we elaborate the subject matter with further details. To begin with,
the transposed direct-form II structure of an nth-order IIR digital filter
using a set of special operators is introduced, and its equivalent state-
space realization is constructed. These are followed by analyzing the
roundoff noise and l2 -sensitivity for the generalized transposed direct-
form II structure and its equivalent state-space realization. Then, given
a transfer function and n free parameters in the operators, a concrete
procedure for evaluating the roundoff noise gain and the l2 -sensitivity mea-
sure of each structure is summarized. Finally, numerical experiments are
presented to demonstrate the validity and effectiveness of the techniques
addressed in this chapter.
16.2 Structural Transformation

Consider an SISO (single-input/single-output) time-invariant linear digital
filter described by
b0 z n + b1 z n−1 + · · · + bn−1 z + bn
H(z) = (16.1)
z n + a1 z n−1 + · · · + an−1 z + an
Let P be an (n + 1) × (n + 1) nonsingular matrix and
q(z) = P z (16.2)
16.2 Structural Transformation 385
where
T T
q(z) = q0 (z) q1 (z) · · · qn (z) , z = zn · · · z 1
By defining scalars {αi | i = 1, 2, · · · , n} and {βi | i = 0, 1, · · · , n} such that

κ 1 α1 · · · αn P = 1 a1 · · · an
(16.3)
κ β0 β1 · · · βn P = b0 b1 · · · bn
the transfer function in (16.1) can be written as
β0 q0 (z) + β1 q1 (z) + · · · + βn qn (z)
H(z) =
q0 (z) + α1 q1 (z) + · · · + αn qn (z)
n
qi (z)
β0 + βi (16.4)
q0 (z)
i=1
= n
qi (z)
1+ αi
q0 (z)
i=1
where κ is a scaling factor such that α0 = 1. From (16.4), it is obvious

that the filter is characterized by scalars {αi | i = 1, 2, · · · , n} and {βi | i =
0, 1, · · · , n} under the polynomial operators {qi (z)| i = 0, 1, · · · , n}. We
now consider a special set of polynomial operators, that leads to an interesting
structure for filter implementation.
Define
z − γi
ρi (z) = for i = 1, 2, · · · , n (16.5)
Δi
where {γi } and {Δi > 0} are two sets of constants to be discussed later. Let
the polynomial operators be chosen as
qi (z) = ρi+1 (z)ρi+2 (z) · · · ρn (z) for i = 0, 1, · · · , n − 1
(16.6a)
qn (z) = 1
⎡ ⎤ ⎡ ⎤
q0 (z) ρ1 (z)ρ2 (z) · · · ρn (z)
⎢ q1 (z) ⎥ ⎢ρ2 (z)ρ3 (z) · · · ρn (z)⎥
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ .. ⎥
⎢ . ⎥ ⎢ = . ⎥ = Pz (16.6b)
⎢ ⎥ ⎢ ⎥
⎣qn−1 (z)⎦ ⎣ ρn (z) ⎦
qn (z) 1
As an example, consider the case where n = 3. Then we have
z 3 − (γ1 + γ2 + γ3 )z 2 + (γ1 γ2 + γ2 γ3 + γ3 γ1 )z − γ1 γ2 γ3
q0 (z) =
Δ1 Δ2 Δ3
z 2 − (γ2 + γ3 )z + γ2 γ3
q1 (z) =
Δ2 Δ3
z − γ3
q2 (z) =
Δ3
q3 (z) = 1
(16.7a)
⎡ ⎤
q0 (z)
⎢q (z)⎥ 1
⎢ 1 ⎥
⎢ ⎥=
⎣q2 (z)⎦ Δ1 Δ2 Δ3
q3 (z)
⎡ ⎤⎡ 3 ⎤
1 −(γ1 + γ2 + γ3 ) γ1 γ2 + γ2 γ3 + γ3 γ1 −γ1 γ2 γ3 z
⎢0 Δ1 −(γ2 + γ3 )Δ1 ⎥⎢
γ2 γ3 Δ1 ⎥⎢z 2 ⎥
⎢ ⎥
·⎢ ⎥⎢ ⎥
⎣0 0 Δ1 Δ2 −γ3 Δ1 Δ2 ⎦⎣ z ⎦
0 0 0 Δ1 Δ2 Δ3 1
(16.7b)
hence
⎡ ⎤
1 −(γ1 + γ2 + γ3 ) γ1 γ2 + γ2 γ3 + γ3 γ1 −γ1 γ2 γ3
1 ⎢0 Δ1 −(γ2 + γ3 )Δ1 γ2 γ3 Δ1 ⎥
⎢ ⎥
P = ⎢ ⎥
Δ1 Δ2 Δ3 ⎣0 0 Δ1 Δ2 −γ3 Δ1 Δ2 ⎦
0 0 0 Δ1 Δ2 Δ3
(16.7c)
Referring to (16.7b), with the choice of {ρi (z)| i = 1, 2, · · · , n} in (16.5) it
is possible to specify the corresponding transformation matrix P and scalar
κ = Δ1 Δ 2 · · · Δ n .
From (16.6a), it follows that
qi (z) ρi+1 (z)ρi+2 (z) · · · ρn (z)
= = ρ−1 −1 −1
1 (z)ρ2 (z) · · · ρi (z) (16.8)
q0 (z) ρ1 (z)ρ2 (z) · · · ρn (z)
16.2 Structural Transformation 387
Hence the transfer function in (16.4) can be written as

n

β0 + βi ρ−1 −1 −1
1 (z)ρ2 (z) · · · ρi (z)
i=1
H(z) = n (16.9)

1+ αi ρ−1 −1 −1
1 (z)ρ2 (z) · · · ρi (z)
i=1
which can be expressed in terms of its difference equation as

n

y(k) = β0 + βi ρ−1 −1 −1
1 (z)ρ2 (z) · · · ρi (z) u(k)
i=1
(16.10)
n

− αi ρ−1 −1 −1
1 (z)ρ2 (z) · · · ρi (z) y(k)
i=1
The transposed direct-form II structure of a ρ operator-based IIR digital filter

specified by (16.10) is illustrated in Figure 16.2, and an implementation of
ρ−1
i (z) is shown in Figure 16.3.
Figure 16.2 Transposed direct-form II structure of a ρ operator-based IIR digital filter.
Figure 16.3 Implementation of ρ−1

i (z).
From Figures 16.2 and 16.3, we obtain
y(k) = w1 (k) + β0 u(k)
wi (k) = ρ−1
i (z) [wi+1 (k) + βi u(k) − αi y(k)]
(16.11)
for i = 1, 2, · · · , n − 1
wn (k) = ρ−1
n (z) [βn u(k) − αn y(k)] , wn+1 (k) = 0
where wi (k) is the output of ρ−1 i (z).

It is seen that this structure has 3n+1 nontrivial parameters {αi }, {βi } and
{Δi } plus n free parameters {γi }. If γi = 0 and Δi = 1 for i = 1, 2, · · · , n,
then Figure 16.2 becomes identical to the transposed direct form II structure
[9, p. 155], while if γi = 1 for i = 1, 2, · · · , n, then Figure 16.2 is identical
to the direct-form delta operator-based filter structure [4]. With the n free
parameters {γi } one can enjoy more degrees of freedom to minimize the
FWL effects.
16.3 Equivalent State-Space Realization

16.3.1 State-Space Realization I
We now derive an equivalent state-space model from (16.11). Since the
operator ρ−1
i (z) in Figure 16.3 is expressed in terms of its transfer function as
Δi
ρ−1
i (z) = for i = 1, 2, · · · , n (16.12)
z − γi
we can write (16.11) as
y(k) = Δ1 x1 (k) + β0 u(k)

Δi
wi (k) = [wi+1 (k) + βi u(k) − αi {Δ1 x1 (k) + β0 u(k)}] (16.13)
z − γi
i = 1, 2, · · · , n
which leads to
(z − γi )xi (k) = −αi Δ1 x1 (k) + Δi+1 xi+1 (k) + (βi − αi β0 )u(k)
i = 1, 2, · · · , n
(16.14a)
16.3 Equivalent State-Space Realization 389
or, equivalently,
xi (k + 1) = −αi Δ1 x1 (k) + γi xi (k) + Δi+1 xi+1 (k) + (βi − αi β0 )u(k)

i = 1, 2, · · · , n
(16.14b)
where xn+1 (k) = 0. Using (16.14b) and the first equation in (16.13), an
equivalent state-space realization (Aρ , bρ , cρ , β0 )n can be constructed as
x(k + 1) = Aρ x(k) + bρ u(k)

(16.15)
y(k) = cρ x(k) + β0 u(k)
where
⎡ ⎤ ⎡ ⎤
−α1 Δ1 Δ2 · · · 0 γ1 0 · · · 0
⎢ .. .. ⎥ ⎢ ..
. .. ⎥
.
⎢ −α Δ 0 . . ⎥ ⎢ 0 γ2 ⎥
Aρ = ⎢ 2. 1 .. +
⎥ ⎢. . ⎥
⎣ .. .
..
. Δn ⎦ ⎣ .. .. .. 0 ⎦
.
−αn Δ1 0 ··· 0 0 · · · 0 γn
⎡ ⎤⎡ ⎤ ⎡ ⎤
−α1 1 · · · 0 Δ1 0 · · · 0 γ1 0 · · · 0
⎢ . .⎥ ⎢ . .. ⎥ ⎢ ..
. .. ⎥
.
⎢ −α 0 . . .. ⎥ ⎢ 0 Δ2 . . . ⎥ ⎢ 0 γ2 ⎥
=⎢ . 2 .. . . ⎥⎢ . ⎥+⎢ . . ⎥
⎣ .. . . 1⎦ ⎣ ..
. .. . . . 0 ⎦ ⎣ .. .. .. 0 ⎦
.
−αn 0 ··· 0 0 · · · 0 Δn 0 · · · 0 γn

bρ = β − β0 α, cρ = Δ1 0 · · · 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
x1 (k) α1 β1
⎢ x2 (k) ⎥ ⎢ α2 ⎥ ⎢ β2 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
x(k) = ⎢ .. ⎥ , α = ⎢ .. ⎥ , β = ⎢ .. ⎥
⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦
xn (k) αn βn
From (16.15), the transfer function from the input u(k) to the output y(k) is
given by
H(z) = cρ (zI n − Aρ )−1 bρ + β0 (16.16)
16.3.2 State-Space Realization II

We now consider constructing an equivalent state-space realization in the case
where Δi = 1 for i = 1, 2, · · · , n. Suppose that (16.10) is expressed with
Δi = 1 for i = 1, 2, · · · , n as
n

y(k) = β0 + β i ρ−1 −1 −1
1 (z)ρ2 (z) · · · ρi (z) u(k)
i=1
(16.17a)
n

− αi ρ−1 −1 −1
1 (z)ρ2 (z) · · · ρi (z) y(k)
i=1
where
1
ρ−1
i (z) = for i = 1, 2, · · · , n (16.17b)
z − γi
The transposed direct-form II structure of a ρ operator-based IIR digital filter
in (16.17a) is depicted in Figure 16.4, and an implementation of ρ−1 i (z) in
case Δi = 1 is drawn in Figure 16.5.
Figure 16.4 Transposed direct-form II structure of a ρ operator-based IIR digital filter.
Figure 16.5 Implementation of ρ−1

i (z) in case Δi = 1.
16.3 Equivalent State-Space Realization 391
In this case, (16.13) is written as

y(k) = x1 (k) + β0 u(k)
1
xi (k) = xi+1 (k) + β i u(k) − αi {x1 (k) + β0 u(k)} (16.18)
z − γi
i = 1, 2, · · · , n
y(k) = x1 (k) + β0 u(k)
xi (k + 1) = −αi x1 (k) + γi xi (k) + xi+1 (k) + (β i − αi β0 )u(k)

i = 1, 2, · · · , n
(16.19)
where xn+1 (k) = 0. By virtue of (16.19), the corresponding equivalent state-
space realization (Aρ , bρ , cρ , β0 )n can be constructed as
x(k + 1) = Aρ x(k) + bρ u(k)
(16.20)
y(k) = cρ x(k) + β0 u(k)
where
⎡ ⎤ ⎡ ⎤
−α1
1 ··· 0 γ1 0 ··· 0
⎢ . . .. ⎥ ⎢ .. .. ⎥
⎢ −α 0 . . ⎥ ⎢ 0 γ2 . .⎥
Aρ = ⎢ . 2 . . ⎥+⎢ ⎥
⎣ .. .. . . 1⎦ ⎣ ... . . . ..
. 0⎦
−αn 0 · · · 0 0 ··· 0 γn

bρ = β − β0 α, cρ = 1 0 · · · 0
⎡ ⎤ ⎡ ⎤ ⎤ ⎡
x1 (k) α1 β1
⎢ x2 (k) ⎥ ⎢ α2 ⎥ ⎢β ⎥
⎢ ⎥ ⎢ ⎥ ⎢ 2⎥
x(k) = ⎢ .. ⎥ , α = ⎢ .. ⎥ , β=⎢ . ⎥
⎣ . ⎦ ⎣ . ⎦ ⎣ .. ⎦
xn (k) αn βn
If the state-space model in (16.15) is related to that in (16.20) by
x(k) = T −1 x(k), T = diag{t1 t2 · · · tn } (16.21)
then we obtain
Aρ = T −1 Aρ T , bρ = T −1 bρ , cρ = cρ T (16.22)
By making use of cρ = cρ T and T Aρ = Aρ T , it follows that

1 0 · · · 0 = Δ1 t1 0 · · · 0 : t1 = Δ−1 1
⎡ −1 −1
⎤
−α1 Δ1 Δ1 0 ··· 0
⎢ . .. ⎥
⎢ −α2 t2 0 t2 . . . ⎥
⎢ ⎥
⎢ .. .. . . ⎥
⎢ . . . . . . 0 ⎥
⎢ ⎥
⎣−αn−1 tn−1 0 · · · 0 tn−1 ⎦
−αn tn 0 ··· 0 0 (16.23)
⎡ ⎤
−α1 Δ2 t2 0 ··· 0
⎢ .. .. ⎥
⎢ −α2 0 Δ 3 t3 . . ⎥
⎢ ⎥
=⎢ .. .. .. .. ⎥
⎢ . . . . 0 ⎥
⎣−α 0 ··· 0 Δ t ⎦
n−1 n n
−αn 0 ··· 0 0
which yields

T = diag Δ−1 −1 −1 −1 −1 −1
1 , Δ1 Δ2 , · · · , Δ1 Δ2 · · · Δn
(16.24)
α = T −1 α, β = T −1 β
because bρ = T −1 bρ .
16.3.3 Choice of {Δi } Satisfying l2 -Scaling Constraints

The controllability Grammian K ρ of the state-space model in (16.20) plays an
important role in the dynamic-range scaling of the state-variable vector x(k)
in (16.15), and matrix K ρ can be obtained by solving the Lyapunov equation
T T
K ρ = Aρ K ρ Aρ + bρ bρ (16.25)
With an equivalent state-space realization as specified in (16.22), the con-
trollability Grammian K ρ of the state-space model in (16.15) assumes the
form
K ρ = T K ρT T , T = diag{t1 , t2 , · · · , tn } (16.26)
If l2 -scaling constraints are imposed on the state-variable vector x(k) in
(16.15), it is required that the controllability Grammian K ρ of the state-space
model in (16.15) is subject to the constraints
eTi K ρ ei = eTi T K ρ T T ei = t2i eTi K ρ ei = 1 for i = 1, 2, · · · , n (16.27)
16.4 Analysis of Roundoff Noise 393
where ei indicates an n × 1 unit vector whose ith element equals unity. Con-

sequently, noting that T = diag Δ−1 −1 −1 −1 −1
1 , Δ1 Δ2 , . . ., Δ1 Δ2 · · · Δn
−1 in
(16.24), i.e., ti = Δ−1 −1 −1

1 Δ2 · · · Δi for i = 1, 2, · · · , n, we obtain

eT2 K ρ e2
Δ1 = eT1 K ρ e1 , Δ2 = , . . .,
eT1 K ρ e1

eTn K ρ en
Δn = (16.28)
eTn−1 K ρ en−1
Therefore, if the filter under discussion is l2 -scaled, then (16.5) and (16.6a)
imply that
n n
n
1 1
q0 (z) = ρl (z) = (z − γl ) = (z − γl )
Δ1 Δ2 · · · Δ n eT K e
l=1 l=1 n ρ n l=1
n
n

1
qi (z) = ρl (z) = (z − γl ) (16.29)
Δi+1 Δi+2 · · · Δn
l=i+1 l=i+1

n
eTi K ρ ei
= (z − γl )
eTn K ρ en l=i+1
qn (z) = 1 for i = 1, 2, · · · , n − 1
and we have
n

κ= Δi = eTn K ρ en (16.30)
i=1
The parameters {Δi } is called the coupling coefficients [4]. These parameters
affect the dynamic range of the signals at the branch nodes, which are actually
the state variables {xi (k)}. The analysis presented above suggests that one
can avoid overflow oscillations if these coupling coefficients are used to scale
the state variables.
16.4 Analysis of Roundoff Noise

16.4.1 Roundoff Noise of ρ-Operator Transposed
Direct-Form II Structure
In this section, we first investigate the effect of roundoff noise produced by
the term αi y(k) at the output. Due to the product quantization, for the actual
filter implemented by a FWL machine, (16.11) is written as
ỹ(k) = w̃1 (k) + β0 u(k)

..
.
w̃i−1 (k) = ρ−1
i−1 (z) [w̃i (k) + βi−1 u(k) − αi−1 ỹ(k)]
(16.31)
w̃i (k) = ρ−1
i (z) [w̃i+1 (k) + βi u(k) − {αi ỹ(k) + εi (k)}]
w̃i+1 (k) = ρ−1

i+1 (z) [w̃i+2 (k) + βi+1 u(k) − αi+1 ỹ(k)]
..
.
where ỹ(k) is the actual output, w̃i (k) is the actual signal of wi (k), and

εi (k) = Q[αi ỹ(k)] − αi ỹ(k)
is the roundoff noise due to the quantizer Q[·]. Subtracting (16.11) from (16.31)
yields
Δy(k) = Δw1 (k)
..
.
Δwi−1 (k) = ρ−1
i−1 (z) [Δwi (k) − αi−1 Δy(k)]
(16.32)
Δwi (k) = ρ−1
i (z) [Δwi+1 (k) − αi Δy(k) − εi (k)]
Δwi+1 (k) = ρ−1

i+1 (z) [Δwi+2 (k) − αi+1 Δy(k)]
..
.
where
Δy(k) = ỹ(k) − y(k), Δwi (k) = w̃i (k) − wi (k)
By comparing (16.32) with (16.11), it is seen that the transfer function Hi (z)
from −εi (k) to Δy(k) is given by (16.16) with β0 = 0 and β replaced by ei ,
that is,
Hi (z) = cρ (zI n − Aρ )−1 ei for i = 1, 2, · · · , n (16.33)

Based on the model developed above, we now define the normalized noise
gain in terms of Hi (z) as

E[y(k)2 ] 1 dz
J1 (αi ) = 2
= HiH (z) Hi (z) (16.34)
E[εi (k) ] 2πj |z|=1 z
16.4 Analysis of Roundoff Noise 395
where AH denotes the conjugate transpose of matrix A. Substituting (16.33)

into (16.34) yields
J1 (αi ) = eTi W ρ ei for i = 1, 2, · · · , n (16.35)
where W ρ is the observability Grammian of the state-space model in (16.15)
which can be obtained by solving the Lyapunov equation
W ρ = ATρ W ρ Aρ + cTρ cρ (16.36)
Similarly, the roundoff noise gain produced by βi can be expressed as
J2 (βi ) = eTi W ρ ei for i = 1, 2, · · · , n (16.37)
Regarding the term β0 , the first equation in (16.32) is changed to
Δy(k) = Δw1 (k) + ε0 (k) (16.38)
where

ε0 (k) = Q[β0 u(k)] − β0 u(k)
By comparing (16.32) whose first equation is replaced by (16.38) with (16.11),
it is seen that the transfer function H0 (z) from ε0 (k) to Δy(k) is described
by (16.16) with β0 = 1 and β = 0, that is,
H0 (z) = −cρ (zI n − Aρ )−1 α + 1 (16.39)
Hence the roundoff noise gain caused by β0 is given by
J3 (β0 ) = αT W ρ α + 1 (16.40)
With wi+1 (k) replaced by Δi+1 xi+1 (k) in (16.11), the roundoff noise gain
due to Δi+1 can be viewed as a function of βi , i.e.,
J2 (Δi+1 ) = J2 (βi ) = eTi W ρ ei for i = 1, 2, · · · , n − 1 (16.41)
Similarly, w1 (k) replaced by Δ1 x1 (k) in (16.11), the roundoff noise gain due
to Δ1 can be considered as that produced by β0 , i.e.,
J3 (Δ1 ) = J3 (β0 ) = αT W ρ α + 1 (16.42)
As shown in Figure 16.3, parameter γi yields a multiplication γi xi (k).

This multiplication produces no roundoff noise if γi = 0, ±1. Let
ψ(γi )ωi (k) denote the roundoff noise due to γi , where ψ(γi ) = 1 for
all γi except γi = 0, ±1, for which ψ(γi ) = 0, and let Δy(k) be the
corresponding output deviation. Then the transfer function from ψ(γi )ωi (k)
to Δy(k) becomes Hi (z) defined above. Actually, this roundoff noise can be
viewed as the one generated by βi u(k). Hence
J4 (γi ) = ψ(γi )J2 (βi ) = ψ(γi )eTi W ρ ei for i = 1, 2, · · · , n (16.43)
Based on the above analysis, the total roundoff noise gain of the filter
structure in Figure 16.2 is defined as
n
n

Jρ = J1 (αi ) + J2 (βi ) + J3 (β0 )
i=1 i=1
(16.44)
n−1
n

+ J2 (Δi+1 ) + J3 (Δ1 ) + J4 (γi )
i=1 i=1
which can be written as

n

Jρ = 3 + ψ(γi ) eTi W ρ ei − eTn W ρ en + 2 αT W ρ α + 1
i=1 (16.45)

= 3 tr W ρ + tr ΨW ρ − eTn W ρ en + 2 αT W ρ α + 1
where
Ψ = diag ψ(γ1 ), ψ(γ2 ), · · · , ψ(γn )
16.4.2 Roundoff Noise of Equivalent State-Space Realization

Consider a stable, controllable and observable state-space model (A, b, c, d)n
described by
x(k + 1) = Ax(k) + bu(k)
(16.46)
a scalar output, and A, b, c, and d are n × n, n × 1, 1 × n, and 1 × 1 real
constant matrices, respectively. Section 15.2.1 states that the roundoff noise
gain due to product quantization associated with the A, b, c, and d matrices
in (16.46) can be expressed as
IS = tr[QW o ] + μ + ν (16.47a)
where W o is the observability Grammian of the state-space model in (16.46),
which can be obtained by solving the Lyapunov equation
16.5 Analysis of l2 -Sensitivity 397
W o = AT W o A + cT c (16.47b)
Q is a diagonal matrix whose ith diagonal element qi is the number of
coefficients in the ith rows of A and b that are neither 0 nor ±1, and μ + ν is
the number of neither 0 nor ±1 constants in c and d.
If (16.47a) is applied to the equivalent state-space realization in (16.15),
the corresponding roundoff noise gain, say ISρ , is given by
ISρ = tr[QW ρ ] + 2 (16.48)
where
q1 = 3, qi = 3 + ψ(γi ) for i = 2, 3, · · · , n − 1, qn = 2 + ψ(γn )
and d = β0 . Hence

ISρ = 3 tr W ρ + tr ΨW ρ − eTn W ρ en + 2 − ψ(γ1 ) eT1 W ρ e1 (16.49)
Notice that
Jρ − ISρ = 2 αT W ρ α + ψ(γ1 ) eT1 W ρ e1 > 0 (16.50)
This reveals that for a given set of {γi }, the equivalent state-space realization
always has lower roundoff noise gain than the corresponding ρ-operator
transposed direct form II structure in Figure 16.2.
16.5 Analysis of l2 -Sensitivity

In this section, the l2 -sensitivity of the transfer function H(z) in (16.16) with
respect to non-zero parameters is analyzed for both ρ-operator transposed
direct form II structure and its equivalent state-space realization. It is noted
that a mixture of l1 and l2 norms was employed in [4] to analyze the sensitivity,
whereas only a pure l2 norm was used in [5].
16.5.1 l2 -Sensitivity of ρ-Operator Transposed

Direct-Form II Structure
Definition 16.1
Let X be an m × n real matrix and let f (X) be a scalar complex function of
X, differentiable with respect to all entries of X. The sensitivity function of
f (X) with respect to X is then defined as
⎡ ⎤
∂f (X) ∂f (X) ∂f (X)
⎢ ∂x ···
⎢ 11 ∂x12 ∂x1n ⎥⎥
⎢ ⎥
⎢ ∂f (X) ∂f (X) ∂f (X) ⎥
⎢
∂f (X) ⎢ ∂x21 · · · ⎥
=⎢ ∂x22 ∂x2n ⎥⎥ (16.51)
∂X ⎢ . .. .. ⎥
⎢ .. ..
. . ⎥
⎢ . ⎥
⎢ ⎥
⎣ ∂f (X) ∂f (X) ∂f (X) ⎦
···
∂xm1 ∂xm2 ∂xmn
where xij denotes the (i, j)th entry of matrix X.
Definition 16.2
Let X(z) be an m × n complex matrix-valued function of a complex variable
z and let xpq (z) be the (p, q)th entry of X(z). The l2 -norm of X(z) is then
defined as
2π m n
1
2
1
||X(z)||2 = |xpq (ejω )|2 dω
2π 0
p=1 q=1
(16.52)
1
2
1 dz
= tr X(z)X H (z)
2πj |z|=1 z
Based on Definitions 16.1 and 16.2, the overall l2 -sensitivity measure for
ρ-operator transposed direct form II structure in Figure 16.2 is defined as

∂H(z) 2 ∂H(z) 2 ∂H(z) 2
Mρ = + +
∂α 2 ∂β 2 ∂β0 2
n
n
(16.53)
∂H(z) 2 ∂H(z) 2
+
∂Δi + ψ(γi )
∂γi 2
i=1 2 i=1
∂H(z)
= −H(z)(zI n − ATρ )−1 cTρ
∂α
∂H(z)
= (zI n − ATρ )−1 cTρ
∂β
∂H(z)
= 1 − cρ (zI n − Aρ )−1 α
∂β0
∂H(z)
= 1 − cρ (zI n − Aρ )−1 α eT1 (zI n − Aρ )−1 bρ
∂Δ1
∂H(z)
= cρ (zI n − Aρ )−1 ei−1 eTi (zI n − Aρ )−1 bρ for i = 2, 3, · · · , n
∂Δi
∂H(z)
= cρ (zI n − Aρ )−1 ei eTi (zI n − Aρ )−1 bρ for i = 1, 2, · · · , n
∂γi
(16.54)
Since ∂H(z)/∂α, ∂H(z)/∂Δi , and ∂H(z)/∂γi can be expressed as
−1
∂H(z) ATρ cTρ cρ β0 cTρ
= − I n 0 zI 2n −
∂α 0 Aρ bρ
−1
∂H(z) T Aρ bρ cρ bρ
= e1 0 zI 2n −
∂Δ1 0 Aρ −α
−1
∂H(z) Aρ ei−1 eTi 0
= cρ 0 zI 2n − for i = 2, 3, · · · , n
∂Δi 0 Aρ bρ
−1
∂H(z) T Aρ bρ cρ 0
= ei 0 zI 2n − for i = 1, 2, · · · , n
∂γi 0 Aρ ei
(16.55)
substituting (16.54) and (16.55) into (16.53) leads to
β0 cTρ
Mρ = β0 cρ bTρ P + tr[W ρ ] + 1 + αT W ρ α
bρ
n
T T
bρ T
0
+ bρ −α N 1 + 0 bρ M i (16.56)
−α i=2
bρ
n
T
0
+ ψ(γi ) 0 ei N i
i=1
ei
where W ρ is the observability Grammian of the state-space model in (16.15),

which can be obtained by solving the Lyapunov equation in (16.36). In
addition, matrices P , M i and N i can be obtained by solving the Lyapunov
equations
T
ATρ cTρ cρ ATρ cTρ cρ In 0
P = P +
0 Aρ 0 Aρ 0 0
T
Aρ ei−1 eTi Aρ ei−1 eTi cT c ρ 0
Mi = Mi + ρ
0 Aρ 0 Aρ 0 0
(16.57)
for i = 2, 3, · · · , n
T
Aρ bρ cρ Aρ bρ cρ ei eTi 0
Ni = Ni +
0 Aρ 0 Aρ 0 0
for i = 1, 2, · · · , n
16.5.2 l2 -Sensitivity of Equivalent State-Space Realization
(1) Improved l2 -Sensitivity Measure for General State-Space Models

The transfer function of the state-space model in (16.46) can be expressed as
H(z) = c(zI n − A)−1 b + d (16.58)
where
⎡ ⎤ ⎡ ⎤
a11 a12 · · · a1n b1
⎢ a21 a22 · · · a2n ⎥ ⎢ b2 ⎥
⎢ ⎥ ⎢ ⎥
A = ⎢ .. .. .. .. ⎥ , b = ⎢ .. ⎥, c = c1 c2 · · · cn
⎣ . . . . ⎦ ⎣ . ⎦
an1 an2 · · · ann bn
The l2 -sensitivity measure for the state-space model in (16.46) is defined

as [11]
n n n
1 ∂H(z) 2 dz 1 ∂H(z) 2 dz
S= +
2πj |z|=1 ∂ail z 2πj |z|=1 ∂bi z
i=1 l=1 i=1
n
1 ∂H(z) 2 dz 1 ∂H(z) 2 dz
+ +
2πj |z|=1 ∂cl z 2πj |z|=1 ∂d z
l=1
(16.59)
where
∂H(z) ∂H(z)
= g(z)ei eTl f (z), = g(z)ei
∂ail ∂bi
∂H(z) ∂H(z)
= eTl f (z), =1
∂cl ∂d
with
f (z) = (zI n − A)−1 b, g(z) = c(zI n − A)−1
Since coefficients 0 and ±1 can be realized precisely in the implementation
of FWL digital filters, the l2 -sensitivity is not affected by these coefficients.
Consequently, the sensitivity of individual elements of coefficient matrices A,
b and c should be changed to [7]
∂H(z) ∂H(z)
= ψ(ail )g(z)ei eTl f (z), = ψ(bi )g(z)ei
∂ail ∂bi
(16.60)
∂H(z) ∂H(z)
= ψ(cl )eTl f (z), = ψ(d)
∂cl ∂d
where

1 for ail =
0, ±1 1 for bi =
0, ±1
ψ(ail ) = , ψ(bi ) =
0 for ail = 0, ±1 0 for bi = 0, ±1

1 for cl =
0, ±1 1 for d = 0, ±1
ψ(cl ) = , ψ(d) =
0 for cl = 0, ±1 0 for d = 0, ±1
Lemma 16.1
The improved l2 -sensitivity measure for the state-space model (A, b, c, d)n
in (16.46) is presented by [7]
n
n n
cT
SI = ψ(ail ) [ c 0 ] R(i, l) + ψ(bi )Wii
i=1 l=1
0 i=1
n
(16.61)

+ ψ(cl )Kll + ψ(d)
l=1
where Kll for l = 1, 2, · · · , n is the (l, l)th entry of the controllability

Grammian K c , Wii for i = 1, 2, · · · , n is the (i, i)th entry of the observability
Grammian W o , and matrices R(i, l), K c and W o are obtained by solving
the Lyapunov equations
T
A ei eTl A ei eTl 0 0
R(i, l) = R(i, l) +
0 A 0 A 0 bbT
for i, l = 1, 2, · · · , n
K c = AK c AT + bbT , W o = A T W o A + cT c
The improved l2 -sensitivity measure given in (16.61) can be modified to two
novel forms so that the number of the Lyapunov equations to be solved is
reduced from n2 + 2 to n + 2. [8]
Theorem 16.1
The improved l2 -sensitivity measure in (16.61) can be expressed in the form
n
n n

el
SI = ψ(ail ) [ eTl 0 ] M (i) + ψ(bi )Wii
i=1 l=1 0 i=1
n
(16.62a)

l=1
where M (i) is obtained by solving the Lyapunov equations
T
A bc A bc 0 0
M (i) = M (i) +
0 A 0 A 0 ei eTi (16.62b)
for i = 1, 2, · · · , n
Proof
Noting that
f (z)g(z) = (zI n − A)−1 bc(zI n − A)−1
−1 (16.63)
A bc 0
= In 0 zI 2n −
0 A In
and defining Φ(z) = f (z)g(z), (16.60) clearly implies that

1 ∂H(z) 2 dz 1 dz
T
= ψ(ail ) el Φ(z)ei eTi ΦT (z −1 ) el

2πj |z|=1 ∂ail z 2πj |z|=1 z

el
= ψ(ail ) [ eTl 0 ] M (i)
0
(16.64)
where
∞
k T k
A bc 0 0 AT 0
M (i) =
0 A ei ei (bc)T AT
k=0
which yields the Lyapunov equations in (16.62b), hence the proof is

complete.
Theorem 16.2
The improved l2 -sensitivity measure in (16.61) can be modified to
n n n
T 0
SI = ψ(ail ) [ 0 ei ] N (l) + ψ(bi )Wii
i=1
ei i=1
l=1
n
(16.65a)

l=1
where N (l) is obtained by solving the Lyapunov equations

T
A bc A bc el eTl 0
N (l) = N (l) +
0 A 0 A 0 0 (16.65b)
for l = 1, 2, · · · , n
Proof

1 ∂H(z) 2 dz 1 dz
= ψ(ail ) eiT ΦT (z −1 )el eTl Φ(z) ei

2πj |z|=1 ∂ail z 2πj |z|=1 z

0
= ψ(ail ) [ 0 eTi ] N (l)
ei
(16.66)
where
∞

k T k
AT 0 el el A bc
N (l) =
k=0
(bc)T AT 0 0 0 A
which yields the Lyapunov equations in (16.65b), hence the proof is

complete.
(2) l2 -Sensitivity Measure for Equivalent State-Space Realization

As for the implementation of an equivalent state-space realization
(Aρ , bρ , cρ , β0 )n in (16.15), the corresponding l2 -sensitivity measure, say
SIρ , can be evaluated directly by employing either of (16.61), (16.62a), and
(16.65a) with
⎧
⎪
⎪ 1 for (i, l) = (k, 1), k = 1, 2, · · · , n
⎪
⎨ 1 for (i, l) = (k, k + 1), k = 1, 2, · · · , n − 1
ψ(aρil ) =
⎪
⎪ ψ(γi ) for i = l = k, k = 2, 3, · · · , n
⎪
⎩
0 otherwise
ψ(bρi ) = 1 for i = 1, 2, · · · , n

ρ 1 for l=1
ψ(cl ) =
0 otherwise
(16.67)
16.6 Filter Synthesis

16.6.1 Computation of Roundoff Noise and l2 -Sensitivity
For given parameters {γi | i = 1, 2, · · · , n}, a concrete procedure for evalua-
ting the roundoff noise gain and the l2 -sensitivity measure for both a ρ-operator
transposed direct form II structure and its equivalent state-space realization is
summarized as the following steps:
1. Compute α = [α1 , α2 , · · · , αn ]T and β = [β 0 , β 1 , · · · , β n ]T using

1 α1 · · · αn P = 1 a1 · · · an
(16.68)
β 0 β 1 · · · β n P = b0 b1 · · · bn
for given {γi | i = 1, 2, · · · , n}, {ai | i = 1, 2, · · · , n} and
{bi | i = 0, 1, · · · , n} where Δi = 1 for i = 1, 2, · · · , n.
2. Construct the corresponding equivalent state-space realization
(Aρ , bρ , cρ , β0 )n in (16.20).
3. Obtain the controllability Grammian K ρ by solving the Lyapunov
equation in (16.25).
4. Compute the l2 -scaling factors Δi for i = 1, 2, · · · , n via (16.28).
5. Find the l2 -scaled α = [α1 , α2 , · · · , αn ]T and β = [β0 , β1 , · · · , βn ]T
by using (16.24) and construct the l2 -scaled equivalent state-space
realization (Aρ , bρ , cρ , β0 )n in (16.15).
16.6 Filter Synthesis 405
6. Obtain the observability Grammian W ρ by solving the Lyapunov

equation in (16.36).
7. Compute the roundoff noise gains Jρ and ISρ via (16.45) and (16.49),
respectively.
8. Calculate the l2 -sensitivity measures Mρ and SI (say SIρ ) from (16.56)
and (16.61) with (16.67), respectively.
16.6.2 Choice of Parameters {γi | i = 1, 2, · · · , n}

It is assumed here that |γi | ≤ 1 for i = 1, 2, · · · , n. For a fixed-point
implementation of Bc bits, every γi must be truncated or rounded into a Bγ -bit
number (Bγ ≤ Bc ) of the form
Bγ

v=± bp 2−p , bp = 0, 1 ∀ p (16.69)
p=1
unless v = ±1. Hence γi for i = 1, 2, · · · , n should take values within a

discrete space defined by
$ Bγ %
#
−p
Sγ = {−1, 1} ± bp 2 , bp = 0, 1 ∀ p (16.70)
p=1
which contains (2Bγ +1 + 1) elements. Hereafter, it is assumed that all

parameters γi for i = 1, 2, · · · , n belong to Sγ , i.e., γi ∈ Sγ ∀ i. As a
result, these parameters do no contribute to the overall l2 -sensitivity measure.
However, they do cause roundoff noise unless they are trivial, i.e., if the
quantization error is generated after the product quantization for some γi .
16.6.3 Search of Optimal Vector γ = [γ1 , γ2 , · · · , γn ]T

Let S γ ∈ Rn×1 be the space in which γ takes values, that is,
S γ = {γ | γi ∈ Sγ ∀ i} (16.71)
It is obvious that Jρ and ISρ are functions of γ, respectively. Therefore, we can
consider the problem of minimizing either Jρ or ISρ with respect to vector γ.
That is
γ Jρopt = arg min Jρ
γ ∈S γ
opt (16.72)
γ ISρ = arg min ISρ
γ ∈S γ
Since Jρ and ISρ are highly nonlinear and nonconvex functions with respect
to γ, it is very difficult to find the optimal solutions. However, the problems
can be solved easily using exhaustive searching since the space S γ includes
(2Bγ +1 + 1)n elements where n is the filter order, and Bγ is the number
of bits for implementing {γi } with 4 to 8 bits typically. Hence by repeating
the procedure summarized
opt inSection 16.6.1 a finite number of times, we
opt
can find the γ Jρ and γ ISρ eventually. For comparison purposes,
we define
γ δ = [1, 1, · · · , 1]T , γ z = [0, 0, · · · , 0]T (16.73)
Finally, we should mention that just like the roundoff noise gains Jρ
and ISρ , the l2 -sensitivity measures Mρ and SIρ are all function of γ, and
therefore the problem of minimizing either Mρ or SIρ with respect to γ is also
considered. That is
γ Mρopt = arg min Mρ
γ ∈S γ
opt (16.74)
γ SIρ = arg min SIρ
γ ∈S γ
We can solve these problems
by
repeating
optthe
same procedure a finite number
of times and find the γ Mρopt and γ SIρ eventually.

Consider a fourth Butterworth lowpass filter, with very narrow bandwidth,
described by
0.031239z 4 + 0.124956z 3 + 0.187434z 2 + 0.124956z

+0.031239
H(z) = 10−3
z 4 − 3.589734z 3 + 4.851276z 2 − 2.924053z + 0.663010
which has a normalized bandwidth 0.025, poles 0.931900 ± j0.136363,
0.862967 ± j0.052305 and zeros −1.000226, −1.000000 ± j0.000226,
−0.999774. Note that the poles are clustered around z = 1. It is assumed
here that Bγ = 4. Hence the optimal solutions of γ are found through
exhaustive searching within the set S γ in (16.71). Since γi ∈ Sγ ∀ i, we
set ψ(γi ) = 0 for i = 1, 2, 3, 4 to compute the l2 -sensitivity measures
Mρ and SIρ from (16.56) and (16.61) with (16.67), respectively, in the
simulation [5].
Suppose that γ δ = [1, 1 , 1 , 1]T . In this case, matrix P in (16.6b) was

found to be ⎡ ⎤
1 −4 6 −4 1
⎢ 0 1 −3 3 −1 ⎥
⎢ ⎥
P =⎢ ⎢ 0 0 1 −2 1 ⎥
⎥
⎣ 0 0 0 1 −1 ⎦
0 0 0 0 1
By using (16.68), α and β were computed as
T
α = 0.410266, 0.082074, 0.009297, 0.000500
T
β = 10−3 0.031239, 0.249912, 0.749735, 0.999647, 0.499824
where Δi = 1 for i = 1, 2, 3, 4 and the equivalent state-space realization
(Aρ , bρ , cρ , β0 )n was constructed as
⎡ ⎤ ⎡ ⎤
0.589734 1 0 0 0.237096
⎢ −0.082074 1 1 0 ⎥ ⎢ ⎥
−3 ⎢ 0.747172 ⎥
Aρ = ⎢ ⎣ −0.009297 0 1 1 ⎦
⎥, bρ = 10 ⎣ 0.999357 ⎦
−0.000500 0 0 1 0.499808

cρ = 1 0 0 0 , β0 = 3.123898 × 10−5
The controllability Grammian K ρ was obtained by solving the Lyapunov
equation in (16.25) as
⎡ ⎤
5.128326 2.077916 0.358716 0.026271
⎢ 2.077916 0.893925 0.165774 0.013317 ⎥
K ρ = 10−2 ⎢ ⎣ 0.358716 0.165774 0.033821 0.003126 ⎦
⎥
0.026271 0.013317 0.003126 0.000363

and the l2 -scaling factors Δi for i = 1, 2, 3, 4 were computed via (16.28) as

Δ1 Δ2 Δ3 Δ4 = 0.226458, 0.417506, 0.194510, 0.103586
The coordinate transformation matrix T , the l2 -scaled α and β were then
found using (16.24) as
T = 102 diag{0.044158, 0.105767, 0.543759, 5.249357}
⎡ ⎤ ⎡ ⎤
1.811665 0.011036
⎢ 0.868073 ⎥ ⎢ 0.079297 ⎥
α=⎢ ⎥
⎣ 0.505557 ⎦ , β = 10−1 ⎢
⎣ 0.543567 ⎦
⎥
0.262375 2.623753
and the l2 -scaled equivalent state-space realization (Aρ , bρ , cρ , β0 )n was

constructed from (16.15) as
⎡ ⎤
0.589734 0.417506 0 0
⎢ −0.196582 1 0.194510 0 ⎥
Aρ = ⎢ ⎣ −0.114488
⎥
0 1 0.103586 ⎦
−0.059417 0 0 1
⎡ ⎤
0.010470
⎢ 0.079026 ⎥
bρ = 10−1 ⎢ ⎣ 0.543409 ⎦
⎥
2.623671

cρ = 10−1 2.264581 0 0 0 , β0 = 3.123898 × 10−5
The controllability Grammian K ρ was derived from (16.26) as

⎡ ⎤
1.000000 0.970487 0.861329 0.608974
⎢ 0.970487 1.000000 0.953392 0.739347 ⎥
Kρ = ⎢ ⎣ 0.861329 0.953392 1.000000 0.892383
⎥
⎦
0.608974 0.739347 0.892383 1.000000
and the observability Grammian W ρ was obtained by solving the Lyapunov

equation in (16.36) as
⎡ ⎤
1.605135 −0.335077 −2.146692 0.336926
⎢ −0.335077 4.747654 −0.461734 −3.803713 ⎥
W ρ = 10−1 ⎢ ⎣ −2.146692 −0.461734
⎥
7.232309 −0.374583 ⎦
0.336926 −3.803713 −0.374583 7.526435
Consequently, the roundoff noise gains Jρ and ISρ were computed from
(16.45) and (16.49) as
Jρ = 8.442661, ISρ = 7.580816
respectively, where Ψ in (16.45) was set to Ψ = diag{0, 0, 0, 0}.

The l2 -sensitivity measures Mρ and SI (say SIρ ) were calculated from
(16.56) and (16.61) with (16.67) as
Mρ = 16.528854, SIρ = 41.019991
respectively.
16.8 Summary 409
Table 16.1 Performance comparison among verious γ

opt opt
γ Jρopt ISρ Mρopt SIρ
γδ 8.442661 7.580816 16.528854 41.019991

γz 8.349654 × 105 4.201743 × 105 6.333195 × 106 4.882296 × 106

γ Jρopt 4.512680 4.286420 8.958807 16.776958
opt
γ ISρ 4.512680 4.286420 8.958807 16.776958
opt
γ Mρ 5.101317 4.681570 7.149017 16.590286
opt
γ SIρ 4.567436 4.294351 7.735488 15.470568
After repeating the procedure summarized in Section 16.6.1 plenty of

times, we arrived at
⎡ ⎤ ⎡ ⎤
1.0000 1.0000
⎢ 0.8750 ⎥ opt ⎢ 0.8750 ⎥
γ Jρopt = ⎢
⎣ 0.8750 ⎦
⎥, γ ISρ = ⎢ ⎣ 0.8750 ⎦
⎥
0.8750 0.8750
⎡ ⎤ ⎡ ⎤
0.9375 1.0000
⎢ 0.9375 ⎥ opt ⎢ 0.9375 ⎥
γ Mρopt = ⎢⎣ 0.8750 ⎦
⎥, γ SIρ = ⎢ ⎣ 0.8750 ⎦
⎥
0.9375 0.8750
Detailed numerical results of applying the present technique to this example

are summarized in comparison with γ δ = [1, 1 , 1, 1]T and γ z = [0, 0, 0, 0]T
in Table 16.1.
16.8 Summary
In this chapter, the transposed direct-form II structure of an nth-order IIR digi-
tal filter using a set of special operators has been introduced, and its equivalent
state-space realization has been constructed. Moreover, the roundoff noise and
l2 -sensitivity have been analyzed for the generalized transposed direct-form II
structure and its equivalent state-space realization. Given a transfer function
and a set of n free parameters, a concrete procedure for evaluating the roundoff
noise gain has been presented and the l2 -sensitivity measure of each structure
has been addressed. Numerical experiments are presented to illustrate the
validity and effectiveness of the present techniques.
References
[1] R. H. Middleton and G. C. Goodwin, “Improved finite word length
characteristics in digital control using delta operators,” IEEE Trans.
Autom. Control, vol. AC-31, no. 11, pp. 1015–1021, Nov. 1986.
[2] J. Kauraniemi, T. I. Laakso, I. Hartimo and S. J. Ovaska, “Delta operator
realizations of direct-form IIR filters,” IEEE Trans. Circuits Syst.-II,
vol. 45, no. 1, pp. 41–52, Jan. 1998.
[3] N. Wong and T.-S. Ng, “Roundoff noise minimization in a modified
direct-form delta operator IIR structure,” IEEE Trans. Circuits Syst.-II,
vol. 47, no. 12, pp. 1533–1536, Dec. 2000.
[4] N. Wong and T.-S. Ng, “A generalized direct-form delta operator-based
IIR filter with minimum noise gain and sensitivity,” IEEE Trans. Circuits
Syst.-II, vol. 48, no. 4, pp. 425–431, Apr. 2001.
[5] G. Li and Z. Zhao, “On the generalized DFIIt structure and its state-space
realization in digital filter implementation,” IEEE Trans. Circuits Syst.-I,
vol. 51, no. 4, pp. 769–778, Apr. 2004.
filtering,” IEEE Trans. Acoust., Speech, Signal Process., vol. 25, no. 4,
pp. 273–281, Aug. 1977.
[7] C. Xiao, “Improved L2 -sensitivity for state-space digital system,” IEEE
Trans. Signal Process., vol. 45, no. 4, pp. 837–840, Apr. 1997.
[8] Y. Hinamoto and A. Doi, “Simplified computation of l2 -sensitivity for
1-D and a class of 2-D state-space digital filters considering 0 and
±1 elements,” Proc. SIGMAP 2013 – Int. Conf. Signal Process. and
Multimedia Appl., pp. 53–58, Reykjavik, Iceland, Jul. 2013.
[9] A. V. Oppenheim and R. W. Schafer, Digital Signal Processing, Engle-
wood Cliffs, NJ: Prentice Hall, 1975.
[10] R. H. Middleton and G. C. Goodwin, Digital Control and Estimation:
A Unified Approach, Englewood Cliffs, NJ: Prentice Hall, 1990.
[11] W.-Y. Yan and J. B. Moore, “On L2 -sensitivity minimization of lin-
ear state-space systems,” IEEE Trans. Circuits Syst. I, vol. 39, no. 8,
pp. 641–648, Aug. 1992.
17
Block-State Realization of IIR Digital Filters
17.1 Preview
An important issue involved in the implementation of IIR digital filters using
fixed-point arithmetic is reducing roundoff noise at the filter’s output. As is
well known, roundoff noise is critically dependent on the internal structure
of IIR digital filters. In this regard, state-space models for digital filters
provide a suitable platform in which the internal structure of a filter can be
explored so as to minimize its roundoff noise by choosing an appropriate
linear transformation for state-space coordinates without altering the input-
output characteristic of the filter. Effective techniques for constructing an
optimal filter structure that minimizes the roundoff noise at the filter output
subject to l2 -scaling constraints have been explored [1–3]. However, such a
realization requires (n + 1)2 multiplications to compute each output sample
for an nth-order filter, an increase of n2 multiplications over canonical direct-
form structures. This is a major disadvantage especially for high-order digital
filters as its effect on data throughput rate becomes much greater when n
is large. Alternatively, block-state realization of IIR digital filters has been
proposed as a method of increasing data throughput rate and reducing finite-
word-length (FWL) effects [4–6]. In the block-state realization, we implement
single-input/single-output (SISO) state-space model by dividing the scalar
input data stream into data vectors of length L, process the data vectors
with an L-input/L-output state-space model, say (L, L) system, and then
reconstruct the scalar output stream from the processed data vectors [5].
At the input of an (L, L) system, a serial-in/parallel-out register converts
the scalar input into a vector input. At the output of the (L, L) system, a
parallel-in/serial-out register converts the vector output into a scalar output.
Obviously, the scalar sample throughput rate is L times the fundamental clock
rate of the (L, L) system. It has
√ been shown [6] that for an nth-order filter, the
optimal blocklength is L = 2n which is noninteger and requires rounding
411
412 Block-State Realization of IIR Digital Filters
to the closest integer. This optimal block length minimizes average √ number of
multiplications needed to compute each output sample to (2+ 2)n+1/2 [6].
Another important and related issue is reducing the effects of quatizing
filter coefficients. It is of importance to note that coefficient quantization
usually alters filter characteristics. For instance, a stable filter designed under
the assumption of infinite precision may become unstable after coefficient
quantization. This has motivated the study of coefficient sensitivity and its
minimization for digital filters. Several techniques has also been explored
to analyze l2 -sensitivity and to synthesize the state-space filter structure that
minimizes l2 -sensitivity for digital filters [7, 8]. In addition, the minimization
problem of l2 -sensitivity subject to l2 -scaling constraints has been treated for
state-space digital filters [9–11]. It is known that the use of scaling constraints
can be beneficial for suppressing overflow oscillations [2, 3].
In this chapter, we shall consider the block-state realization for a given
SISO state-space model, and examine some of the properties of the block-
state realization. Second, we analyze the roundoff noise in the block-state
realization and minimize the average roundoff noise gain subject to l2 -scaling
constraints. Third, we present a quantitative analysis on l2 -sensitivity for the
block-state realization of state-space digital filters. Following the analysis,
we study two techniques for minimizing a sensitivity measure known as
average l2 -sensitivity subject to l2 -scaling constraints for the block-state
realization of state-space digital filters. One of the techniques is based on
a Lagrange function, while the other relies on an efficient quasi-Newton
algorithm. Finally, numerical experiments are presented to demonstrate the
validity and effectiveness of the techniques addressed in this chapter.
17.2 Block-State Realization

Consider a single-input/single-output (SISO) state-space model (A, b, c, d)n
described by
x(k + 1) = Ax(k) + bu(k)
(17.1)
where x(k) is an n × 1 state-variable vector, u(k) is a scalar input, y(k) is a
scalar output, and A, b, c, and d are n × n, n × 1, 1 × n, and 1 × 1 real constant
matrices, respectively. The SISO state-space model in (17.1) is assumed to be
stable, controllable and observable. A block-diagram of the state-space model
in (17.1) is depicted in Figure 17.1. Note that in the most pessimistic case
where all elements of coefficient matrices in (17.1) are nontrivial, (n + 1)2
17.2 Block-State Realization 413
x(k+1) x(k)
multiplications are required to compute the output y(k) during each sample
interval.
From (17.1), it follows that for some integer L > 0,
x(kL + i) = Ai x(kL) + Ai−1 bu(kL) + · · ·

+ Abu(kL + i − 2) + bu(kL + i − 1)
y(kL + i) = cAi x(kL) + cAi−1 bu(kL) + · · · (17.2)
+ cbu(kL + i − 1) + du(kL + i)
i = 0, 1, · · · , L − 1
which leads to
x(kL + L) = AL x(kL)
⎡ ⎤
u(kL)
⎢
⎢
⎥
⎥u(kL + 1)
+ AL−1 b AL−2 b · · · b ⎢ ⎥ ..
⎣ ⎦ .
u(kL + L − 1)
⎡ ⎤ ⎡ ⎤
y(kL) c
⎢ y(kL + 1) ⎥ ⎢ cA ⎥
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ = ⎢ .. ⎥ x(kL)
⎣ . ⎦ ⎣ . ⎦
y(kL + L − 1) cAL−1
⎡ ⎤⎡ ⎤
d 0 ··· 0 u(kL)
⎢ .. . ⎥ ⎢ u(kL + 1) ⎥
⎢ cb d . .. ⎥
+⎢⎢ ⎥⎢ ⎥
.. ⎥⎢ .. ⎥
. 0⎦ ⎣ ⎦
.. ..
⎣ . . .
cA L−2
b · · · cb d u(kL + L − 1)
(17.3)
x̂(k + 1) = Âx̂(k) + B̂u(k)

(17.4a)
ŷ(k) = Ĉ x̂(k) + D̂u(k)
where
⎡ ⎤ ⎡ ⎤
u(kL) y(kL)
⎢ ⎥
u(kL + 1) ⎢ y(kL + 1) ⎥
⎢ ⎥ ⎢ ⎥
x̂(k) = x(kL), u(k) = ⎢ .. ⎥, ŷ(k) = ⎢ .. ⎥
⎣ . ⎦ ⎣ . ⎦
u(kL + L − 1) y(kL + L − 1)

Â = AL , B̂ = AL−1 b AL−2 b · · · b
⎡ ⎤ ⎡ ⎤
c d ··· 00
⎢ ⎥ ⎢ .. .⎥
⎢ cA ⎥ ⎢ cb d . .. ⎥
Ĉ = ⎢ ..⎥, ⎢
D̂ = ⎢ ⎥
⎣ ⎦ .. .. .. ⎥
. ⎣ . . . 0⎦
cAL−1 cAL−2 b · · · cb d
(17.4b)
From (17.4a), it follows that a total of

L(L + 1)
m = n2 + 2nL + (17.5)
2
multiplications are required in each block of L output samples or, equivalently,
an average of
m n2 L+1
= + 2n + (17.6)
L L 2
multiplications for each output sample. If the average measure, m/L, in (17.6)
is minimized with respect to L, then the optimal blocklength is obtained as
√
L = 2n (17.7)
which is noninteger and requires rounding to the closest integer. By substitu-

ting (17.7) into (17.6), the minimum value of m/L becomes
m √ 1
= 2+ 2 n+ (17.8)
L min 2
This result compares favorably with the processing complexity for the
canonical forms of the state-space model in (17.1), which require (2n + 1)
multiplications per output sample, and reveals that the system in (17.4a)
enables us to perform fast processing for high-order digital filters.
In the rest of this chapter, the L-input/L-output state-space model
described by (17.4a) is referred to as block-state realization, (Â, B̂, Ĉ, D̂)n ,
that is generated by the SISO state-space model (A, b, c, d)n in (17.1). In
this way, (17.4b) defines a mapping (A, b, c, d)n −→ (Â, B̂, Ĉ, D̂)n that
transforms the state-space model in (17.1) to the block-state realization in
(17.4a). This block-state realization corresponds to the time-invariant case for
the block-state realization of periodically time-varying digital filter, that was
derived by Meyer and Burrus [12].
The system in (17.4a) can be realized by an L-input/L-output IIR digital
filter implemented with serial-in/parallel-out and parallel-in/serial-out regis-
ters, as shown in Figure 17.2. It is obvious that the serial input and output
sample rates are L times the fundamental clock rate of the L-input/L-output
system. The internal structure of the block-state realization for block length
of L = 3 is illustrated by the flow graph in Figure 17.3.
We now examine some of the properties of the mapping (A, b, c, d)n →
(Â, B̂, Ĉ, D̂)n , which is defined by (17.4b).
Lemma 17.1
Since Â = AL , the eigenvalues of matrix Â are the Lth power of those of
matrix A, i.e., the poles of the block-state realization are the Lth power of the
poles of the associated SISO state-space model.
u(k) REGISTER
^ ^ ^ ^
(A, B, C, D)n
REGISTER y(k)
Figure 17.2 Block-state realization using serial-in/parallel-out and parallel-in/serial-out
registers.
u(3k) u(3k+1) u(3k+2) y(3k) y(3k+1) y(3k+2)
cb
cAb
cb
c cA cA2
A2b Ab b
z-1In
A3
Figure 17.3 Flow graph structure of a block-state realization for block length of three.
Lemma 17.2
Since Â = AL , the dimension of the state space of the block-state realization
is the same as that of the associated SISO state-space model.
Theorem 17.1: Controllability Invariance

The controllability is invariant under the mapping (A, b, c, d)n →
(Â, B̂, Ĉ, D̂)n .
Proof
Let the controllability matrix for (A, b, c, d)n be denoted by

V n = b Ab · · · An−1 b
and let the controllability matrix for (Â, B̂, Ĉ, D̂)n be denoted by

n−1
V̂ n = B̂ ÂB̂ · · · Â B̂
Applying the Cayley-Hamilton theorem to matrix A, all columns of V̂ n can

be expressed as linear combinations of the columns of V n , i.e.,
rank [V̂ n ] ≤ rank [V n ] (17.9)
Alternatively, all the columns of V n are included among the columns of

V̂ n , i.e.,
rank [V n ] ≤ rank [V̂ n ] (17.10)
Hence
rank [V n ] = rank [V̂ n ] (17.11)
that completes the proof of the theorem.
Theorem 17.2: Observability Invariance

The observability is invariant under the mapping (A, b, c, d)n →
(Â, B̂, Ĉ, D̂)n .
Proof
Due to the duality, the proof of this theorem is essentially the same as that of
Theorem 17.1.
Corollary 17.1: Irreducibility Invariance

(Â, B̂, Ĉ, D̂)n is the minimal realization if and only if the associated SISO
state-space model (A, b, c, d)n is minimal, i.e., controllable and observable.
Theorem 17.3: Equivalence Invariance

(T −1 AT , T −1 b, cT , d)n → (T −1 ÂT , T −1 B̂, ĈT , D̂)n if and only if
(A, b, c, d)n → (Â, B̂, Ĉ, D̂)n where T is an n × n nonsingular matrix.
Proof
This proof follows directly from the definition of the mapping (A, b, c, d)n →
(Â, B̂, Ĉ, D̂)n given by (17.4b). That is, Â = AL if and only if T −1 ÂT =
T −1 AL T = (T −1 AT )L , etc.
Theorem 17.4: Controllability Grammian Invariance

The controllability Grammian is invariant under the mapping (A, b, c, d)n →
(Â, B̂, Ĉ, D̂)n .
Proof
Let the controllability Grammians for (A, b, c, d)n and (Â, B̂, Ĉ, D̂)n be
denoted by
∞
∞
k k
Kc = Ak b(Ak b)T and K̂ c = Â B̂(Â B̂)T
k=0 k=0
respectively. Then it follows that

T 2 2
K̂ c = B̂ B̂ + ÂB̂(ÂB̂)T + Â B̂(Â B̂)T + · · ·
∞
(17.12)
= Ak b(Ak b)T = K c
k=0
where

B̂ = AL−1 b AL−2 b · · · Ab b

ÂB̂ = A2L−1 b A2L−2 b · · · AL+1 b AL b
2
Â B̂ = A3L−1 b A3L−2 b · · · A2L+1 b A2L b
..
.
Theorem 17.5: Observability Grammian Invariance

The observability Grammian is invariant under the mapping (A, b, c, d)n →
(Â, B̂, Ĉ, D̂)n .
Proof
Suppose that the observability Grammians for (A, b, c, d)n and (Â, B̂,
Ĉ, D̂)n are denoted by
∞
∞
k k
Wo = (cAk )T cAk and Ŵ o = (Ĉ Â )T Ĉ Â
k=0 k=0
17.3 Roundoff Noise Analysis and Minimization 419
respectively. Then it follows that

T 2 2
Ŵ o = Ĉ Ĉ + (Ĉ Â)T Ĉ Â + (Ĉ Â )T Ĉ Â + · · ·
∞
(17.13)
= (cAk )T cAk = W o
k=0
where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
c cAL cA2L
⎢ ⎥ cA ⎢ cAL+1 ⎥ ⎢cA2L+1 ⎥
⎢ ⎥ ⎢ ⎥ 2 ⎢ ⎥
Ĉ = ⎢ ⎥, .. Ĉ Â = ⎢ .. ⎥, Ĉ Â = ⎢ .. ⎥, ···
⎣ ⎦ . ⎣ . ⎦ ⎣ . ⎦
L−1
cA cA2L−1 cA3L−1
17.3 Roundoff Noise Analysis and Minimization

17.3.1 Roundoff Noise Analysis
In what follows, we use the following symbols.
σ2 : Variance of the noise generated by a single scalar roundoff

operation.
σy2 : Variance of the roundoff noise in the output of a SISO
state-space model.
σ̂y2 : Variance of the roundoff noise in the single output of a
block-state realization.
σŷ2i : Variance of the roundoff noise in the ith output of the
L-input/L-output system in the block-state realization.
It is assumed that roundoff is carried out only at the outputs of state variable
summing nodes, and at the outputs of the summing nodes at the filter’s output.
The effects of roundoff noise is modelled as stationary white noises w(k) and
v(k) with zero mean and covariance matrices σ 2 I n and σ 2 I L , respectively,
and these noise sources are introduced into the state and output equations as
x̂(k + 1) = Âx̂(k) + B̂u(k) + w(k)

(17.14)
ŷ(k) = Ĉ x̂(k) + D̂u(k) + v(k)
As a result, the autocorrelation matrix of the vector output ŷ(k) can be

written as
Rŷ (l) = E{ŷ(k + l)ŷ(k)T }
∞
k+l k T
(17.15)
= Ĉ Â (Ĉ Â ) + I L δ(l) σ 2
k=0
whose elements are given by

∞

(k+l)L+i−1 kL+j−1 T
Rŷ (l) i,j = cA (cA ) + δ(i − j) σ 2 (17.16)
k=0
for i, j = 1, 2, · · · , L where l ≥ 0. Hence, the variances associated with

individual outputs can be expressed as
∞

σŷ2i = Rŷ (0) i,i = cAkL+i−1 (cAkL+i−1 )T + 1 σ 2 (17.17)
k=0
for i = 1, 2, · · · , L.
When the individual outputs are combined by a parallel-in/serial-output
register to form a single output, the resulting noise will no longer be stationary.
The autocorrelation function of the single output can be written as

Ry (kL+i−1, lL+j −1) = E{y(kL+i−1)y(lL+j −1)} = Rŷ (k−l) i,j
(17.18)
for i = 1, 2, · · · , L. Hence the variance of the nonstationary output noise is
given by
σ̂y2 (kL + i − 1) = σŷ2i (17.19)
for i = 1, 2, · · · , L. From (17.19), it is observed that the noise variance is
periodic with period L.
We now examine the relationship between the roundoff noise in the block-
state realization and the roundoff noise in the associated SISO state-space
model.
For the SISO state-space model (A, b, c, d)n in (17.1), the roundoff noise
is given by [1] ∞

σy2 = cAk (cAk )T + 1 σ 2
k=0 (17.20)
2
= tr[W o ] + 1 σ
17.3 Roundoff Noise Analysis and Minimization 421
where W o is the observability Grammian of the SISO state-space model in

(17.1), which can be obtained by solving the Lyapunov equation
We remark that the formula in (17.20) can also be obtained from (17.17) by
setting L = 1. Noting that
cAk (cAk )T ≥ 0
∞

cAkL+i−1 (cAkL+i−1 )T = cAi−1 (cAi−1 )T + cAL+i−1 (cAL+i−1 )T
k=0
+ cA2L+i−1 (cA2L+i−1 )T + · · ·
(17.21)
it follows that
∞
∞

kL+i−1 kL+i−1 T
cA (cA ) ≤ cAk (cAk )T (17.22)
k=0 k=0
for i = 1, 2, · · · , L. This in conjunction with (17.17), (17.19), (17.20) and

(17.22) implies that
σ̂y2 (kL + i − 1) ≤ σy2 (17.23)
holds for any k and i = 1, 2, · · · , L, or equivalently,
σ̂y2 (k) ≤ σy2 (17.24)
holds for all k. This shows that the variance of the roundoff noise for block-
state realization never exceeds that of the associated SISO state-space model.
Let the average roundoff noise (averaged over one block period) in the
output of a block-state realization be defined from (17.19) by
L L
1 2 1 2
(σ̂y2 )ave = σ̂y (kL + i − 1) = σŷi (17.25)
L L
i=1 i=1
Since
∞
L ∞

kL+i−1 kL+i−1 T
cA (cA ) = cAk (cAk )T (17.26)
i=1 k=0 k=0
we can deduce from (17.17) that

∞

1
2
(σ̂y )ave = cA (cA ) + 1 σ 2
k k T
L
k=0
(17.27)

1
= tr[W o ] + 1 σ 2
L
On comparing (17.27) with (17.20), it can be seen that in block-state realization
the average roundoff noise variance from internal roundoff noise sources is
reduced by a factor of the block length L.
17.3.2 Roundoff Noise Minimization Subject to

If a coordinate transformation defined by
x(k) = T −1 x(k) (17.28)
is applied to the SISO state-space model in (17.1), the new realization
(A, b, c, d)n can be characterized by
A = T −1 AT , b = T −1 b, c = cT (17.29)
For this realization, the average roundoff noise variance in (17.27) is changed
to
2 1
σ̂y (T )ave = tr[T W o T ] + 1 σ 2
T
(17.30)
L
The controllability Grammian K̂ c for (Â, B̂, Ĉ, D̂)n described by (17.4a)
plays an important role in the dynamic-range scaling of the state-variable
vector x̂(k). Theorem 17.4 states that the controllability Grammian is invariant
under the mapping (A, b, c, d)n → (Â, B̂, Ĉ, D̂)n , i.e., K c = K̂ c where
K c can be obtained by solving the Lyapunov equation
K c = AK c AT + bbT
Thus, if the SISO state-space model (A, b, c, d)n in (17.1) is scaled, the
resulting block-state realization (Â, B̂, Ĉ, D̂)n will automatically be scaled
as well.
With an equivalent realization as specified in (17.29), the controllability
Grammian assumes the form
K c = T −1 K c T −T (17.31)
17.4 l2 -Sensitivity Analysis and Minimization 423
If l2 -scaling constraints are imposed on the new state-variable vector x(k)

defined by (17.28), it is required that
(T −1 K̂ c T −T )ii = (T −1 K c T −T )ii = 1 for i = 1, 2, · · · , n (17.32)
The problem being considered here is to obtain an n×n coordinate transforma-

tion matrix T that minimizes tr[T T W o T ] in (17.30) subject to the l2 -scaling
constraints in (17.32). This problem can readily be solved by applying the
technique in Section 15.2.2. In short, the solution of this problem is given by
1
n 1
− 12
1 1 1
QZ T
2 4
T =√ θi Wo W o2 K c W o2 (17.33)
n
i=1
where θi2 for i = 1, 2, · · · , n are the eigenvalues of K c W o , matrix Q is

derived from the eigenvalue-eigenvector decomposition
1 1 1
= Q diag{θ1 , θ2 , · · · , θn }QT
2
W o K cW o
2 2
and matrix Z is an n × n orthogonal matrix such that
ZΛ−2 Z T ii
= 1 for i = 1, 2, · · · , n
which can be obtained by numerical manipulation [3, p. 278] where
Λ = diag{λ1 , λ2 , · · · , λn }
1
θ1 + θ 2 + · · · + θ n 2
λi = for i = 1, 2, · · · , n
nθi
17.4 l2 -Sensitivity Analysis and Minimization

17.4.1 l2 -Sensitivity Analysis
The transfer function of the block-state realization in (17.4a) is given by
H(z) = Ĉ(zI n − Â)−1 B̂ + D̂ (17.34)
whose (i, j)th element is described by
Hij (z) = ci (zI n − Â)−1 bj + dij (17.35)

where

B̂ = b1 b2 · · · bL
⎡ ⎤
c1 ⎧
⎢ c2 ⎥ ⎪
⎨ 0 for i<j
⎢ ⎥
Ĉ = ⎢ .. ⎥ , dij = d for i=j
⎣ . ⎦ ⎪
⎩ i−j−1
cA b for i>j
cL
We are now in a position to define the l2 -sensitivity of the block-state
realization in (17.4a).
Definition 17.1
Let X be an m × n real matrix and let f (X) be a scalar complex function of
X, differentiable with respect to all entries of X. The sensitivity function of
f (X) with respect to X is then defined as
∂f (X) ∂f (X)
SX = , (S X )ij = (17.36)
∂X ∂xij
where xij denotes the (i, j)th entry of matrix X.
Definition 17.2
Let X(z) be an m × n complex matrix-valued function of a complex variable
z and let xpq (z) be the (p, q)th entry of X(z). The l2 -norm of X(z) is then
defined as
⎡ ⎤1
2π m n 2
1
jω 2
X(z)2 = ⎣ xpq (e ) dω ⎦
2π 0
p=1 q=1
(17.37)
!1
2
1 dz
= tr X(z)X H (z)
2πj |z|=1 z
From Definitions 17.1 and 17.2, the l2 -sensitivity measure for the subsystem
in (17.35) is defined by
" " " " " " " "
" ∂Hij (z) "2 " ∂Hij (z) "2 " ∂Hij (z) "2 " ∂Hij (z) "2
Sij = "" " + " " + " " + " "
∂ Â "2 " ∂bj "2 " ∂cTi "2 " ∂dij "2
" "2 " "2 " "2
= "[f j (z)g i (z)] T "2 + "g iT (z)"2 + "f j (z)"2 + uo (i − j)
(17.38)
where
f j (z) = (zI n − Â)−1 bj = (zI n − AL )−1 AL−j b
g i (z) = ci (zI n − Â)−1 = cAi−1 (zI n − AL )−1

1 for i ≥ 0
uo (i) =
0 for i < 0
In the rest of this section, f j (z) and g i (z) are referred to as intermediate
functions.
Using simple algebraic manipulations, the l2 -sensitivity measure in
(17.38) can be expressed as
∞
T
Sij = tr N ij (I n ) + cAkL+i−1 cAkL+i−1
k=0
∞
(17.39)
T
+ A(k+1)L−j b A(k+1)L−j b + uo (i − j)
k=0
where

1 dz
N ij (I n ) = [f j (z)g i (z)] T f j (z −1 )g i (z −1 )
2πj |z|=1 z
A closed-form solution for evaluating N ij (I n ) will be deduced shortly.

We note that each single output y(kL + i − 1) for i = 1, 2, · · · , L in the
output vector ŷ(k) is generated by the subsystem

H i (z) = Hi1 (z), Hi2 (z), · · · , HiL (z)
(17.40)
−1
= ci zI − Â B̂ + di

where di = di1 , di2 , · · · , diL . From this in conjunction with (17.39), the
l2 -sensitivity measure for the subsystem in (17.40) is found to be
L
L
∞
L
T
Si = Sij = tr N ij (I n ) + cAkL+i−1 cAkL+i−1
j=1 j=1 j=1 k=0
(17.41)
∞
T
+ Ak b Ak b + i
k=0
for i = 1, 2, · · · , L. As a result, the overall l2 -sensitivity for the block-state

realization in (17.34) can be expressed as
L
L
L ∞
L
T
S= Si = tr N ij (I n ) + cAk cAk
i=1 i=1 j=1 j=1 k=0
(17.42)
L
∞
T L(L + 1)
+ Ak b Ak b +
2
i=1 k=0
L
L
L(L + 1)
S= tr N ij (I n ) + L tr W o + L tr K c + (17.43)
2
i=1 j=1
where K c and W o are the controllability and observability Grammians of the

system in (17.1), respectively.
We now define the average l2 -sensitivity (over one block period of
length L) in the output of a block-state realization as
L
1 S
(Si )ave = Si = (17.44)
L L
i=1
which in conjunction with (17.43) gives

L L
1 L+1
(Si )ave = tr N ij (I n ) + tr W o + tr K c + (17.45)
L 2
i=1 j=1
For comparison purpose, the l2 -sensitivity measure for the SISO state-space
model in (17.1) is found to be (Sections 12.2 and 12.3)

So = tr N (I n ) + tr W o + tr K c + 1 (17.46)
where

1 dz
N (I n ) = [f (z)g(z)]T f (z −1 )g(z −1 )
2πj |z|=1 z
f (z) = (zI n − A)−1 b, g(z) = c(zI n − A)−1

and for any n×n symmetric positive-definite matrix P , the Grammian N (P )

can be obtained by solving the Lyapunov equation
# $T # $ −1
A bc A bc P 0
Y = Y +
0 A 0 A 0 0
and then taking the lower-right n × n block of Y as N (P ), namely,

# $
0
N (P ) = 0 I n Y
In
To identify an optimal internal structure of a given IIR digital filter that

achieves minimum average l2 -sensitivity, we examine a coordinate transfor-
mation defined by (17.28). When the coordinate transformation defined by
(17.28) is applied to the SISO state-space model in (17.1), the new realization
associated with state x(k), denoted by (A, b, c, d)n , is related to the original
realization as
K c = T −1 K c T −T , W o = T T W oT
(17.47)
f j (z) = T −1 f j (z), g i (z) = g i (z) T
and the canonical-state to block-state mapping is given by
(T −1 AT , T −1 b, cT , d)n → (T −1 ÂT , T −1 B̂, ĈT , D̂)n (17.48)
Moreover, the Grammian N ij (I n ) which is introduced in (17.39) is trans-

formed into N ij (I n ) as follows:

1 dz
N ij (I n ) = [ f (z)g i (z) ]T f j (z −1 )g i (z −1 )
2πj |z|=1 j z (17.49)
= T T N ij (P ) T
where
P = TTT

1 dz
N ij (P ) = [f j (z)g i (z)] T P −1 f j (z −1 )g i (z −1 )
2πj |z|=1 z
It is noted that
f j (z)g i (z) = T −1 f j (z)g i (z) T

−1 # $
−1 zI n − AL −AL−j bcAi−1 0
= T 0
0 zI n − AL T
(17.50)
If we denote the observability Grammian of the composite system f j (z)g i (z)
in (17.50) by Y ij , matrix N ij (P ) can be obtained by solving the Lyapunov
equation
T
AL AL−j bcAi−1 AL AL−j bcAi−1
Y ij = Y ij
0 AL 0 AL
(17.51)
P −1 0
+
0 0
and then taking the lower-right n × n block of Y ij as N ij (P ), namely,

# $
0
N ij (P ) = 0 I n Y ij (17.52)
In
Therefore, under the coordinate transformation x(k) = T −1 x(k) defined by

(17.28), the average l2 -sensitivity measure in (17.45) becomes
L L
1 T
Si (T )ave = tr T N ij (T T T )T + tr T T W o T
L
i=1 j=1 (17.53)
L+1
+ tr T −1 K c T −T +
2
For comparison purpose, it follows from (17.46) that
So (T ) = tr[T T N (T T T )T ] + tr[T T W o T ] + tr[T −1 K c T −T ] + 1 (17.54)
From (17.4b) and (17.29), it can be shown that the transfer functions
H(z) in (17.34) and Hij (z) in (17.35) are invariant under the coordinate
transformation defined by (17.28).
17.4.2 l2 -Sensitivity Minimization Subject to

17.4.2.1 Method 1: using a Lagrange function
We now consider the problem of minimizing the average l2 -sensitivity measure
in (17.53) subject to l2 -scaling constraints in (17.32). Since the measure in
(17.53) can be expressed in terms of matrix P = T T T as
L L
1 L+1
Si (P )ave = tr N ij (P )P + tr W o P + tr K c P −1 +
L 2
i=1 j=1
(17.55)
the problem we deal with is a constrained nonlinear optimization problem
where the variable is matrix P .
It is important that the coefficient sensitivity defined above be minimized
subject to constraints so that input-to-state energy-flow is appropriately scaled
which in analytic term is known as l2 -scaling. If we sum up the n l2 -scaling
constraints in (17.32), then we have
tr[T −1 K c T −T ] = tr[K c P −1 ] = n (17.56)
Consequently, the problem of minimizing (17.53) subject to the constraints in

(17.32) can be relaxed into the following problem:
minimize Si (P )ave in (17.55) with respect to P

(17.57)
subject to tr[K c P −1 ] = n
We now address problem (17.57) as the first step of our solution procedure.
To this end, we define the Lagrange function of the problem as
L L
1
J(P , λ) = tr[N ij (P )P ] + tr[W o P ] + tr[K c P −1 ]
L
i=1 j=1
(17.58)
L+1
+ + λ tr[K c P −1 ] − n
2
where λ is a Lagrange multiplier. It is well known that the solution of
problem (17.57) must satisfy the Karush-Kuhn-Tucker (KKT) conditions
∂J(P , λ)/∂P = 0 and ∂J(P , λ)/∂λ = 0 where
L L L L
∂J(P , λ) 1 1
= N ij (P ) + W o − P −1 M ij (P )P −1
∂P L L
i=1 j=1 i=1 j=1
− (λ + 1)P −1 K c P −1 (17.59)
∂J(P , λ)
= tr[K c P −1 ] − n
∂λ
The matrices M ij (P ) in (17.59) are obtained by solving the Lyapunov
equations
T
AL AL−j bcAi−1 AL AL−j bcAi−1
X ij = X ij
0 AL 0 AL
(17.60)
# $
0 0
+
0 P
and then taking the upper-left n × n block of X ij as M ij (P ), namely,
# $
In
M ij (P ) = I n 0 X ij (17.61)
0
The KKT conditions in (17.59) can be expressed compactly as
P F (P )P = G(P , λ), tr[K c P −1 ] = n (17.62)
where
L L
1
F (P ) = N ij (P ) + W o
L
i=1 j=1
L L
1
G(P , λ) = M ij (P ) + (λ + 1)K c
L
i=1 j=1
Note that the first equation in (17.62) is highly nonlinear with respect to P .
An effective approach to solving the first equation in (17.62) is to relax it into
the recursive second-order matrix equation
P k+1 F (P k )P k+1 = G(P k , λk+1 ) (17.63)
where P k is assumed to be known from the previous recursion. Note that if
the matrix sequence {P k } converges to its limit matrix, say P , then (17.63)
converges to the first equation in (17.62) as k goes to infinity. The solution

P k+1 of (17.63) is then given by
1 1 1 1 1
P k+1 = F (P k )− 2 F (P k ) 2 G(P k , λk+1 )F (P k ) 2 2 F (P k )− 2 (17.64)
In order to derive a recursive formula for the Lagrange multiplier λ, we use

(17.62) to write
L L
1
tr P F (P ) = tr M ij (P )P −1 + n λ + 1 (17.65)
L
i=1 j=1
which naturally suggests the recursion for λ

L
L
1
tr P k F (P k ) − tr M ij (P k )P −1
k
L
i=1 j=1
λk+1 = −1 (17.66)
n
where P 0 is the initial estimate. This iteration process continues until

Si (P k+1 )ave − Si (P k )ave + n − tr K c P −1 < ε (17.67)
k+1

If the iteration is terminated at step k, then we take P = P k as the solution.
Since P = T T T , the optimal T assumes the form
1
T = P 2U (17.68)
where P 1/2 is the square root of the matrix P obtained above, and U is an
n × n orthogonal matrix.
As the second step of the solution procedure, we now turn our attention
to the construction of the optimal coordinate transformation matrix T that
solves the problem of minimizing (17.53) subject to the constraints in (17.32).
To this end, we examine a procedure for determining the n × n orthogonal
matrix U in (17.68) in order for the nonsingular matrix T to satisfy the l2 -
scaling constraints in (17.32).
1 1
K c = T −1 K c T −T = U T P − 2 K c P − 2 U (17.69)
In order to find an n × n orthogonal matrix U such that the matrix

K c in (17.69) satisfies the scaling constraints in (17.32), we perform the
eigenvalue-eigenvector decomposition for the symmetric positive-definite

matrix P −1/2 K c P −1/2 as
1 1
P − 2 K c P − 2 = RΘRT (17.70)
where Θ = diag{θ1 , θ2 , · · · , θn } with θi > 0 and R is an orthogonal matrix.

Next, an orthogonal matrix S such that
⎡ ⎤
1 ∗ ··· ∗
⎢ . . .. ⎥
⎢ ∗ 1 . . ⎥
SΘS T = ⎢ . . ⎥ (17.71)
⎣ .. . . . . . ∗ ⎦
∗ ··· ∗ 1
can be obtained by numerical manipulations [3, p. 278]. Using (17.69), (17.70)

and (17.71), it can be readily verified that the orthogonal matrix U = RS T
leads to a K c in (17.69) whose diagonal elements are equal to unity, hence
the constraints in (17.32) are now satisfied. This matrix T together with
(17.68) gives the solution of the problem of minimizing (17.53) subject to
the constraints in (17.32) as
1
T = P 2 RS T (17.72)
17.4.2.2 Method 2: using a Quasi-Newton algorithm

Since the state-space model in (17.1) is stable and controllable, the con-
1/2
trollability Grammian K c is symmetric and positive-definite. Hence K c
1/2 1/2
satisfying K c = K c K c is also symmetric and positive-definite.
By defining
− 12
T̂ = T T K c (17.73)
the l2 -scaling constraints in (17.32) can be written as
−T −1
(T̂ T̂ )ii = 1 for i = 1, 2, · · · , n (17.74)
−1
The constraints in (17.74) simply state that each column in matrix T̂ must
−1
be a unit vector. Matrix T̂ is assumed to have the form
# $
−1 t1 t2 tn
T̂ = , ,··· , (17.75)
||t1 || ||t2 || ||tn ||
so that (17.74) is always satisfied. Using (17.73), it follows from (17.53) that
L L
1 T T L+1
Si (T̂ )ave = tr T̂ N̂ ij (T̂ )T̂ + tr T̂ Ŵ o T̂ + n +
L 2
i=1 j=1
(17.76)
where
1 1 T 1 1 1 1
N̂ ij (T̂ ) = K c2 N ij (K c2 T̂ T̂ K c2 )K c2 , Ŵ o = K c2 W o K c2
From the foregoing arguments, the problem of obtaining an n × n nonsingular

matrix T which minimizes the average l2 -sensitivity Si (T )ave in (17.53)
subject to the l2 -scaling constraints in (17.32) can be converted into an
unconstrained optimization problem of obtaining an n × n nonsingular matrix
T̂ in (17.75) which minimizes Si (T̂ )ave in (17.76).
We now apply a quasi-Newton algorithm [13] to minimize (17.76) with
respect to matrix T̂ in (17.75). Let x be the column vector that collects the
independent variables in matrix T̂ , i.e.,
x = (tT1 , tT2 , · · · , tTn )T (17.77)
Then, Si (T̂ )ave is a function of x and is denoted by Jo (x). The algorithm starts
with a trivial initial point x0 obtained from an initial assignment T̂ = I n .
Then, in the kth iteration, a quasi-Newton algorithm updates the most recent
point xk to point xk+1 as
xk+1 = xk + αk dk (17.78)
where

dk = −S k ∇Jo (xk ), αk = arg min Jo (xk + αdk )
α
% & T
γ Tk S k γ k δ k δ Tk − δ k γ Tk S k +S k γ k δ k
S k+1 = S k + 1 +
γ Tk δ k γ k δk
T γ k δk
T
S 0 = I, δ k = xk+1 −xk , γ k = ∇Jo (xk+1 ) − ∇Jo (xk )
In the above, ∇Jo (x) is the gradient of Jo (x) with respect to x, and S k is a
positive-definite approximation of the inverse Hessian matrix of Jo (x).
|Jo (xk+1 ) − Jo (xk )| < ε (17.79)


at step k, the xk is taken to be the solution of the minimization problem.
The implementation of (17.78) requires the computation of ∇Jo (x) in
each iteration, hence the availability of a closed-form formulation to compute
∇Jo (x) will make the algorithm considerably faster relative to the evaluation
of ∇Jo (x) based on numerical differentiation. To this end, note that the
gradient of Jo (x) with respect to tpq is defined by
∂Jo (x) Si (T̂ pq )ave − Si (T̂ )ave

= lim (17.80)
∂tpq Δ→0 Δ
where T̂ pq is the matrix obtained from T̂ with its (p, q)th component perturbed
by Δ and it is follows that [14, p. 655]
ΔT̂ g pq eTq T̂
T̂ pq = T̂ + T̂ + ΔT̂ g pq eTq T̂
1− ΔeTq T̂ g pq
(17.81)
' ( 1
tq
g pq = −∂ ||tq ||
/∂tpq = (tpq tq − ||tq ||2 ep )
||tq ||3
The gradient of Jo (x) with respect to tpq can be evaluated using closed-form
expressions as
∂Jo (x)
= 2 β1 − β2 + β3 (17.82)
∂tpq
where
L L
1 T T
β1 = eq T̂ N̂ ij (T̂ )T̂ T̂ g pq
L
i=1 j=1
L L
1 T −T T
β2 = eq T̂ M̂ ij (T̂ )g pq , β3 = eTq T̂ Ŵ o T̂ T̂ g pq
L
i=1 j=1
−1 1 T 1
− 12
M̂ ij (T̂ ) = K c 2 M ij (K c2 T̂ T̂ K c2 )K c
17.4.3 l2 -Sensitivity Minimization Without Imposing

The method described above can also be utilized to minimize an average l2 -
sensitivity measure in (17.53) without imposing the l2 -scaling constraints in
(17.32). This can be done by simply setting the Lagrange multiplier λ to zero
in (17.58). In this case, (17.64) is changed to
1 1 1 1 1
P k+1 = F (P k )− 2 F (P k ) 2 G(P k )F (P k ) 2 2 F (P k )− 2 (17.83)
where
L L
1
G(P ) = M ij (P ) + K c
L
i=1 j=1
With an initial estimate P 0 , the recursive process in (17.83) continues until

Si (P k+1 )ave − Si (P k )ave < ε (17.84)

If the iteration is terminated at step k, then we take P = P k as the
solution. A coordinate transformation matrix T that minimizes the average
l2 -sensitivity measure in (17.53) is then found to be
1
T = P 2U (17.85)
where P 1/2 is the square root of the matrix P obtained above, and U is any
n × n orthogonal matrix.
17.4.4 Numerical Experiments

A. Filter Description and Its Controllability and Observability
Grammians
Consider the 4th-order Butterworth lowpass filter (Ao , bo , co , d)4 with a
narrow normalized passband of 0.05, described by
⎡ ⎤ ⎡ ⎤
3.589734 1 0 0 0.237096
⎢ −4.851276 0 1 0 ⎥ ⎢ 0.035885 ⎥
Ao = ⎢⎣ 2.924053 0 0 1 ⎦ ,
⎥ bo = 10−3 ⎢
⎣ 0.216300 ⎦
⎥
−0.663010 0 0 0 0.010527

co = 1 0 0 0 , d = 3.123898 × 10−5
whose numerator b and denominator a was found using MATLAB as [b, a] =

butter(4, 0.05). Applying a coordinate transformation defined by
T o = diag{ 0.226458, 0.588059, 0.513017, 0.150144 }

to the above state-space model, we obtained an equivalent state-space model

(A, b, c, d)4 that satisfies l2 -scaling constraints as
⎡ ⎤
3.589734 2.596768 0 0
⎢ −1.868197 0 0.872390 0 ⎥
A=⎢ ⎣ 1.290747
⎥
0 0 0.292669 ⎦
−1.000000 0 0 0
T
b = 10−3 1.046973 0.061023 0.421624 0.070114

c = 0.226458 0 0 0 , d = 3.123898 × 10−5
where
A = T −1
o Ao T o , b = T −1
o bo , c = co T o
and its controllability and observability Grammians K c and W o were
computed by solving the Lyapunov equations
K c = AK c AT + bbT , W o = A T W o A + cT c
as
⎡ ⎤
1.000000 −0.999248 0.997433 −0.994918
⎢ −0.999248 1.000000 −0.999452 0.998047 ⎥
Kc = ⎢ ⎣ 0.997433 −0.999452
⎥
1.000000 −0.999566 ⎦
−0.994918 0.998047 −0.999566 1.000000
⎡ ⎤
1.063597 2.747677 2.360090 0.672994
⎢ 2.747677 7.172055 6.224575 1.793652 ⎥
W o = 104 ⎢ ⎣ 2.360090 6.224575 5.458399 1.589267 ⎦
⎥
0.672994 1.793652 1.589267 0.467539

√
Since 2n = 5.656854 for the filter order n = 4, the blocklength is set to
L = 6 in this section.
B. Roundoff Noise
The gain of the original average roundoff noise in (17.27) was computed as
1
tr[W o ] + 1 = 2.360365 × 104
6
which was compared with the roundoff noise gain for the SISO state-space
model shown in (17.20) as
tr[W o ] + 1 = 14.161690 × 104
The optimal coordinate transformation matrix that minimizes the average

roundoff noise in (17.30) was constructed using (17.33) as
⎡ ⎤
−0.287484 −0.559872 −0.450804 −0.403517
⎢ 0.257608 0.607739 0.448499 0.418814 ⎥
T =⎢ ⎣ −0.224220 −0.651540 −0.442903 −0.441924 ⎦
⎥
0.190114 0.693107 0.430204 0.473354

and the controllability and observability Grammians were found to be
K c = T −1 K c T −T
⎡ ⎤
1.000000 0.295942 0.341255 0.475526
⎢ 0.295942 1.000000 −0.341255 −0.475526 ⎥
=⎢⎣ 0.341255 −0.341255
⎥
1.000000 0.909588 ⎦
0.475526 −0.475526 0.909588 1.000000
W o = T T W oT
⎡ ⎤
0.138885 0.041102 0.047395 0.066044
⎢ 0.041102 0.138885 −0.047395 −0.066044 ⎥
=⎢⎣ 0.047395 −0.047395
⎥
0.138885 0.126328 ⎦
0.066044 −0.066044 0.126328 0.138885
The minimum gain of the average roundoff noise in (17.30) was computed as
1
tr[T T W o T ] + 1 = 1.092590
6
which was compared with the roundoff noise gain for the equivalent
realization, (T −1 AT , T −1 b, cT , d)n , of a SISO state-space model as
tr[T T W o T ] + 1 = 1.555541
C. Initial l2 -Sensitivity
The average l2 -sensitivity measure for the block-state realization described
by (17.34) was computed from (17.45) as
(Si )ave = 40.933372 × 104
which was compared with the l2 -sensitivity measure for the SISO state-space
model shown in (17.46) as
So = 977.917589 × 104
D. Minimization of l2 -Sensitivity Subject to l2 -Scaling

Constraints
1) The Use of A Lagrange Function
The recursive matrix equation in (17.64) together with (17.66) was applied to
minimize (17.58) with tolerance ε = 10−8 in (17.67) and initial estimate I 4 .
It took the algorithm 84 iterations to converge to the solution
⎡ ⎤
0.875338 −0.904805 0.932402 −0.958480
⎢ −0.904805 0.938156 −0.969673 0.999738 ⎥
P =⎢ ⎣ 0.932402 −0.969673
⎥
1.005245 −1.039528 ⎦
−0.958480 0.999738 −1.039528 1.078311
and from (17.72)

⎡ ⎤
0.327836 0.559715 0.663273 0.121038
⎢ −0.313941 −0.586064 −0.699777 −0.080238 ⎥
T =⎢⎣ 0.301577
⎥
0.604055 0.740336 0.036271 ⎦
−0.294198 −0.614717 −0.783389 0.013515
was obtained. The new controllability Grammian was found to be

⎡ ⎤
1.000000 0.300486 0.342142 0.474457
⎢ 0.300486 1.000000 −0.342142 −0.474457 ⎥
Kc = ⎢ ⎣ 0.342142 −0.342142
⎥
⎦
1.000000 0.910932
0.474457 −0.474457 0.910932 1.000000
and the average l2 -sensitivity measure for the block-state realization described
Si (T )ave = 8.759262
which was compared with the l2 -sensitivity measure for the equivalent
realization, (A, b, c, d)n , of a SISO state-space model shown in (17.54) as
So (T ) = 29.500326
The profiles of the average l2 -sensitivity measure Si (P )ave in (17.55) and

tr[K c P −1 ] during the first 84 iterations of the algorithm are depicted in
Figures 17.4.
6
10
Si (P k-1)ave
4
10
2
10
0
10
1 10
k
1
10
0
10
tr[K c Pk-1 ]
-1
-1
10
-2
10
1 10
k
Figure 17.4 Profiles of Si (P )ave and tr[K c P −1 ] during the first 84 iterations.
2) The Use of A Quasi-Newton Algorithm

The quasi-Newton algorithm was applied to minimize (17.76) by choosing
T̂ = I 4 as an initial assignment, and setting tolerance to ε = 10−8 in (17.79).
It took the algorithm 22 iterations to converge to the solution
⎡ ⎤
0.571134 −0.415538 −1.326551 0.389172
⎢ 0.407792 0.933965 −1.087187 −2.120849 ⎥
T̂ = ⎢⎣ 0.043983 −0.117207
⎥
3.040058 2.480486 ⎦
−0.373909 0.428710 1.758732 2.561697
⎡ ⎤
−0.307934 0.205102 0.403183 −0.758913
⎢ 0.350071 −0.233366 −0.379012 0.785847 ⎥
T =⎢ ⎣ −0.385730
⎥
0.267285 0.350018 −0.813980 ⎦
0.414649 −0.303751 −0.313832 0.845979
The new controllability Grammian was found to be
⎡ ⎤
1.000000 0.609669 0.656806 −0.159006
⎢ 0.609669 1.000000 0.530787 0.197938 ⎥
Kc = ⎢⎣ 0.656806 0.530787
⎥
1.000000 −0.674646 ⎦
−0.159006 0.197938 −0.674646 1.000000
Si (T )ave = 8.759262
realization, (A, b, c, d)n , of a SISO state-space model, shown in (17.54) as
So (T ) = 29.500303
The profile of the average l2 -sensitivity measure Jo (x) during the first 22
iterations of the algorithm is depicted in Figure 17.5.
15
Jo (xk )
10
5
0 10 20
k
Figure 17.5 Profile of Jo (x) during the first 22 iterations.
References 441
E. Minimization of l2 -Sensitivity Without Imposing

The recursive matrix equation in (17.83) was applied to minimize (17.55)
with tolerance ε = 10−8 in (17.84). Choosing P 0 = I 4 in (17.83), it took the
algorithm 13 iterations to converge to the solution
⎡ ⎤
2.204544 −2.275660 2.342049 −2.404601
⎢ −2.275660 2.356583 −2.432877 2.505500 ⎥
P =⎢ ⎣ 2.342049 −2.432877
⎥
2.519439 −2.602761 ⎦
−2.404601 2.505500 −2.602761 2.697538
which yields
⎡ ⎤
0.793097 −0.756746 0.723513 −0.692392
1 ⎢ −0.756746 0.771434 −0.773723 0.768219 ⎥
T =P2 =⎢
⎣ 0.723513 −0.773723
⎥
0.818655 −0.852715 ⎦
−0.692392 0.768219 −0.852715 0.949130
The new controllability Grammian was found to be
⎡ ⎤
0.819811 0.097982 0.106633 −0.052902
⎢ 0.097982 0.482067 0.066716 0.095486 ⎥
Kc = ⎢⎣ 0.106633 0.066716
⎥
0.146989 −0.020279 ⎦
−0.052902 0.095486 −0.020279 0.044099
Si (T )ave = 7.190177
realization, (A, b, c, d)n , of a SISO state-space model shown in (17.54) as
So (T ) = 28.210150
The profile of the average l2 -sensitivity measure Si (P )ave in (17.55) during

the first 13 iterations of the algorithm is shown in Figure 17.6.
17.5 Summary
In this chapter, we have considered the block-state realization that is derived
from a given SISO state-space model, and examined some of the properties of
the block-state realization. Second, we have analyzed the roundoff noise in the
6
10
Si (P k-1)ave
4
10
2
10
0
10
1 10
k
Figure 17.6 Profile of Si (P )ave during the first 13 iterations.
block-state realization and minimized the average roundoff noise gain subject
to l2 -scaling constraints. Third, we have analyzed l2 -sensitivity in the block-
state realization and minimized the average l2 -sensitivity subject to l2 -scaling
constraints where two methods have been presented. One has been based on
a Lagrange function, while the other has relied on an efficient quasi-Newton
algorithm. Finally, numerical experiments have been presented to demonstrate
the validity and effectiveness of the techniques addressed in this chapter.
References
[1] S. Y. Hwang, “Roundoff noise in state-space digital filtering: A general
analysis,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-24,
no. 3, pp. 256–262, June 1976.
fixed point digital filters,” IEEE Trans. Circuits Syst., vol. CAS-23, no, 9,
pp. 551–562, Sept. 1976.
no. 4, pp. 273–281, Aug. 1977.
[4] C. W. Barnes and S. Shinnaka, “Finite word effects in block-state
realization of fixed-point digital filters,” IEEE Trans. Circuits Syst., vol.
CAS-27, no. 5, pp. 345–349, May 1980.
References 443
[5] C. W. Barnes and S. Shinnaka, “Block-shift invariance and block

implementation of discrete-time filters,” IEEE Trans. Circuits Syst., vol.
CAS-27, no. 8, pp. 667–672, Aug. 1980.
[6] J. Zeman and A. G. Lindgren, “Fast digital filters with low roundoff
noise,” IEEE Trans. Circuits Syst., vol. CAS-28, no. 7, pp. 716–723,
July 1981.
[7] W.-Y. Yan and J. B. Moore, “On L2 -sensitivity minimization of linear
state-space systems,” IEEE Trans. Circuits Syst. I, vol. 39, no. 8, pp.
641–648, Aug. 1992.
[8] T. Hinamoto, S. Yokoyama, T. Inoue, W. Zeng and W.-S. Lu, “Analy-
sis and minimization of L2 -sensitivity for linear systems and two-
dimensional state-space filters using general controllability and observ-
ability Grammians,” IEEE Trans. Circuits Syst. I, vol. 49, no. 9, pp.
1279–1289, Sept. 2002.
straints,” IEEE Trans. Circuits Syst.-II, vol. 52, no. 10, pp. 641–645, Oct.
2005.
[10] T. Hinamoto, K. Iwata and W.-S. Lu, “L2 -sensitivity Minimization
of one- and two-dimensional state-space digital filters subject to L2 -
scaling constraints,” IEEE Trans. Signal Processing, vol. 54, no. 5,
pp. 1804–1812, May 2006.
[11] T. Hinamoto, O. I. Omoifo and W.-S. Lu, “Realization of MIMO linear
discrete-time systems with minimum l2 -sensitivity and no overflow
oscillations,” in Proc. ISCAS 2006, pp. 5215–5218.
[12] R. A. Meyer and C. S. Burrus, “A unified analysis of multirate and
periodically time-varying digital filters,” IEEE Trans. Circuits Syst., vol.
CAS-22, no. 3, pp. 162–168, Mar. 1975.
[13] R. Fletcher, Practical Methods of Optimization, 2nd ed., Wiley,
New York, 1987.
[14] T. Kailath, Linear System, Englewood Cliffs, N.J.: Prentice-Hall, 1980.
Index
ρ operator-based IIR analog filter theory 135, 141

digital filter 387, 390 analog signal 2, 6, 10, 19
ρ-operator transposed direct analog-to-digital (A/D)
form II structure 393, 397, converter 4
398, 404 analytical approach 117, 129
l1 -scaling 262 antisymmetric impulse
l2 -scaling 280, 285, 310, 362 response 104, 105, 106, 107
l2 -scaling constraints 318, 321, arithmetic-geometric
392, 422 mean inequality 51, 270,
l2 -sensitivity 273, 299, 400, 405 305, 309
l2 -sensitivity measure 13, 23, asymptotically stable 92, 202,
25, 280 203, 268
l2 -sensitivity minimization 280, autocorrelation
285, 290, 429 coefficient 330, 332
z-transform 13, 23, 25, 280 autocorrelation matrix 330, 420
3 dB cutoff frequency 48 average l2 -sensitivity 412, 426,
3 dB rejection bandwidth 48 427, 442
average l2 -sensitivity
A measure 428, 434, 438, 440
adder 37, 213, 239 average roundoff noise 412, 422,
adder’s overflow 266, 267, 268 436, 442
algebraic equivalence 79 average roundoff noise
aliasing 16, 17, 20, 22 variance 422
all-pass complementary 53
all-pass digital filter 45, 48, B
164, 170 balanced 73, 80, 90, 91
alternating optimization 243 balanced model
alternation theorem 121 reduction 196, 209, 210, 211
amplitude characteristic 43, 170, balanced realization 89, 90,
171 198, 200
amplitude response 101, 121, balanced state-space
228, 230 model 80, 81
445
446 Index
bandpass filter 98, 113, coefficient sensitivity 57, 64,

115, 148 254, 263
bandstop filter 98, 114, coefficient sensitivity
115, 146 minimization 273
Bartlett window 111, 113 coefficients sensitivity 299
Bellanger’s formula 100 composite filter 239, 240, 250
bilinear-transformation continuous optimization 245
method 135, 143, 148 continuous-time signal 2, 7,
bisection method 300, 311, 323 15, 254
Blackman window 111, 113 continuous-time sinusoidal
block diagram 37, 98, 358, 364 signal 14, 16, 21
block-state realization 411, 412, continuous-time transfer
415, 438 function 135
bounded-input bounded-output contour integral 29, 30, 32
stability 57 controllability Grammian 70,
Broyden-Fletcher-Goldfarb- 197, 332, 418
Shanno algorithm 314 controllability Grammian
Butterworth invariance 418
approximation 136 controllability invariance 416
Butterworth filter 136, 142, 147 controllability matrix 69, 82,
197, 417
C controllable 68, 79, 81, 182
canonical decomposition 84 controllable canonical form 74,
cascaded lattice realization 47, 75, 89, 412
48, 124 convex problem 217, 224,
Cauchy’s integral theorem 30, 244, 247
276, 331 convex programming 161
causality 23, 36 convex quadratic
Cayley-Hamilton theorem 73, 77, approximation 246
82, 188 convex quadratic constraint 220,
characteristic equation 73, 269 224, 244
characteristic polynomial 269 convex quadratic programming
Chebyshev approximation 120, (QP) 157, 160, 172
131, 133, 136 convex quadratic programming
Chebyshev filter 136, 137, (QP) problem 160
138, 147 convex-concave procedure
Cholesky decomposition 79 (CCP) 213, 217
coefficient quantization 97, 253, convexification 219, 221,
299, 412 222, 236
Index 447
convolution 28, 35 duality 78, 417

coordinate dynamic-range scaling 310, 392
transformation 73, 198,
276, 362 E
cosine sequence 23, 24 eigenvalue-eigenvector
coupling coefficient 384, 393 decomposition 79, 197,
crosscorrelation vector 330 287, 363
cutoff frequency 111, 112 electrocardiograph signal 3
electroencephalogram
D signal 3
data throughput rate 411 elliptic approximation 135, 138
delta operator 357, 383, 384, 388 elliptic filter 138, 147
diesel engine signal 3 equiripple 165, 166, 239, 248
difference equation 23, 37, 40, 64 equiripple design 155
differentiation 28, 64, 263, 434 equivalence invariance 417
digital signal 2, 4, 5, 21 equivalent state-space
digital signal processing 4, 5, 21 description 266, 368
digital transversal filter 97 equivalent state-space
digital-to-analog (D/A) realization 388, 390,
converter 5 396, 400
Dirac delta function 11, 14 equivalent transformation 73, 80,
discrete Fourier transform 83, 95
(DFT) 13 error feedback 328, 332,
discrete-time Fourier 333, 338
transform 11, 12, 20 error feedforward 364, 365, 368
discrete-time sequence 23, 54 error spectrum shaping 327
discrete-time signal 10, 20, 23, 54 Euler’s formula 6
discrete-time sinusoidal exponential sequence 23, 24,
signal 15 25, 42
discrete-time system 23, 34,
46, 95 F
discrete-time transfer Faddeev’s formula 67, 71, 72, 199
function 135 final-value theorem 8, 29
double-precision finite impulse response 38, 97,
accumulator 328 110, 175
doubly complementary 23, 53, finite-length register 253, 258,
54, 55 260, 263
dual 78 finite-word-length (FWL)
dual system 42 constraint 254, 338, 357, 383
448 Index
finite-word-length (FWL) G
effect 273, 357, 383 Gaussian random process
finite-word-length 339, 366
implementation 338 general FIR filter design 117,
finite-word-length 129, 133
realization 411 generalized transposed direct-
FIR digital filter 38, 97, 121, 124 form II structure 384, 409
first-order Gray-Markel’s lattice
information 179 filter 46
fixed point 254, 260 group delay 44, 163, 235, 241
fixed-point arithmetic 254, 257,
357 H
fixed-point Hamming window 111, 113
implementation 263, 328, 405 Hankel matrix 85, 88, 91, 196
fixed-point number 254, 260 Hanning window 111, 113
floating point 257 high-order error feedback 327,
floating-point 328, 338, 339
arithmetic 5, 254, 257, 270 highpass filter 98, 99, 112, 338
flow-graph reversal 39
forward shift I
operator 383 image signal 2
Fourier series expansion 6, 17, improved l2 -sensitivity
18, 108 measure 400, 402, 403
Fourier transform 7, 13, 20, 33 impulse function 10, 11
frequency domain 7, 151, impulse response 38, 70, 101, 141
161, 168 incomplete Hankel matrix 88
frequency response 42, 53, initial-value theorem 9, 29
98, 113 input-normal 79, 80, 90
frequency response input-normal realization 90
error 164, 170, 171, 172 input-normal state-space
frequency transfer model 80
function 42, 100, 103, 109 input-quantization 253, 254
frequency transformation internal structure 67, 266,
111, 114 357, 415
frequency-response-masking FIR interpolated FIR filter 213,
filter 213, 239 215, 239
Frobenius norm 303 interpolation formula 21, 124
fundamental frequency 6 interpolation function 21
Index 449
invariant impulse-response least-squares

method 135, 141 approximation 173, 174, 211
inverse z-transform 23, 30, least-squares design 114, 128,
46, 330 157, 190
inverse discrete Fourier transform limit cycle 254, 257, 266, 268
(IDFT) 13 limit cycle-free
inverse discrete-time realization 254, 266, 270
Fourier transform 12 linear phase 97, 100, 118, 130
inverse Fourier transform 7, 20 linear programming 155,
inverse Hessian matrix 282, 314, 156, 171
370, 433 linear-equation
inverse Laplace transform 8 error 192, 193, 194
inverse relation 193 linear-phase filter 100
inverse-Chebyshev linear-phase FIR composite
approximation 137, 135 filter 240, 241
inverse-Chebyshev linear-phase FIR filter
filter 137, 138 design 118, 130, 133
irreducibility invariance 417 linearity 23, 27, 34, 257
lossless bounded-real (LBR) 92
J lossless bounded-real (LBR)
Jury-Marden criterion 57, 61, lemma 67, 91, 95
62, 65 lowpass filter 98, 100, 214, 349
lowpass-to-bandpass
K transformation 140
Kaiser’s formula 99 lowpass-to-highpass
Kalman’s canonical structure transformation 140
theorem 67, 81, 83 lowpass-to-lowpass
Karush-Kuhn-Tucker transformation 141
condition 278 Lyapunov criterion 57, 62, 65
Karush-Kuhn-Tucker Lyapunov equation 70, 201,
conditions 286, 429 278, 404
Lyapunov function 267
L Lyapunov stability
Lagrange function 180, 285, theorem 92, 93, 182
293, 318
Lagrange multiplier 286, 330, M
362, 429 magnitude spectrum 7, 12, 13
Laplace transform 8, 13, 14, 18 mapping 15, 143, 415, 422
Laplace transform pairs 9 matrix inversion 118, 185
450 Index
matrix inversion O
formula 283, 370 observability Grammian 70, 81,
maximum rank 201, 277
decomposition 86, 89 observability Grammian
minimal 68, 85, 87 invariance 418
minimal l2 -sensitivity 277, 280 observability invariance 417
minimal partial observability
realization 67, 87, 89 matrix 69, 82, 92, 197
minimal pole observable 67, 68, 84, 204
sensitivity 299, 306, 309, 323 observable canonical
minimal realization 85, 92, form 77, 78
94, 417 one’s complement 254
minimax design 161, 172, one-dimensional (1-D) signal 1
218, 222 optimal blocklength 411, 414
minimum mean squared error optimal coordinate
design 151 transformation 287, 299,
modified least squares 323, 431
problem 173, 174 optimal realization 280, 317,
multidimensional (M-D) signal 2 364, 374
multiplier 37, 46, 180, 300 output equation 68, 419
musical sound 3 output-normal 67, 73, 80
output-normal realization 91
N output-normal state-space
new realization 281, 307, model 79
351, 422 overflow 254, 256, 258, 266
noise variance 329, 420, 422 overflow oscillation 257, 258, 270
nonlinear system 267, 268
nonrecursive digital filter 97 P
normal matrix 260 Pade’s approximation 175, 204,
normalized frequency 16, 210
146, 383 Parks-McClellan’s
normalized lattice algorithm 120, 133, 234, 248
two-pair 47, 48 Parseval’s formula 179
normalized noise Parseval’s theorem 8, 12, 13, 33
gain 329, 340, 366, 394 partial realization 67, 87, 89
notch filter 51, 52, 383 partial-fraction
notch frequency 48, 50 expansion 23, 30, 142, 143
Nyquist frequency 19, 20 peak gain 241, 248, 250
Index 451
peak-to-peak amplitude reduced complexity 225

ripple 241 reduced-order approximation
phase characteristic 43, 98, 81, 91
109, 167 reduced-order subsystem 198
phase characteristic Remez algorithm 165, 168
error 167, 168, 170, 171 Remez exchange
phase delay 101 algorithm 123, 131, 166, 170
phase spectrum 7, 12, 13 Remez iterative algorithm 169
pole and zero right nullspace 200, 201
sensitivity 299, 300, 306, 318 rounding 253, 255, 273, 355
pole sensitivity matrix 301 roundoff error 257, 266, 339, 366
pole sensitivity measure 303, 304 roundoff noise 98, 262, 357, 393
polynomial operators 357, 385 roundoff noise
power complementary 53 minimization 361, 362, 422
power series expansion 23, 31
power spectrum 12 S
product quantization 358, 393, sampled-data signal 2
396, 405 sampling 10, 14, 20, 146
projection theorem 193 sampling frequency 10, 15,
prototype filter 143, 239, 240, 249 19, 146
sampling period 10, 24, 146
Q
sampling rate 15, 16, 17, 383
quadratic-measure
sampling theorem 1, 17, 20, 22
minimization 114, 128, 133
saturation overflow 256, 257, 258
quantization 97, 255, 266, 328
scaling rule 262, 281
quantization error 253, 327,
Schur-Cohn criterion 57, 60,
328, 405
61, 65
quantization step size
Schur-Cohn-Fujiwara
255, 260, 262
criterion 57, 60, 65
quantized boxcar signal 2
Schwarz inequality 304
quantizer 255, 328, 338, 355
second-order information 178,
quasi-Newton
179, 205, 211
algorithm 153, 280, 312, 368
seismic signal 3, 4
R shaping filter 239, 240, 241, 247
rectangular window 110, 133 signed magnitude 254
recursive matrix similarity transformation 73
equation 273, 293, 438, 441 single multiplier lattice
recursive second-order matrix two-pair 47
equation 279, 308, 311, 430 singular value decomposition 90
452 Index
sparsity 214, 225, 230, 236 two’s complement

spectral norm 200 overflow 256, 257, 258
speech signal 1, 2, 3, 4 two’s complement
stability 23, 57, 60, 62 representation 254, 256,
stability constraints 157, 160, 161 257, 270
stability criteria 57 two’s complement truncation 328
stability triangle 52, 65, 160, 269 two-dimensional (2-D) signal 2
state equation 68, 69, 258, 266
state-space description 40, 67, U
73, 264 unconstrained optimization
state-space filter problem 282, 313, 369, 433
structure 273, 357, 412 uncontrollable 68, 82, 84
state-space model 67, 79, 80, 95 unit delay 37, 41
state-space unit pulse sequence 23, 24, 35
realization 388, 390, 396, 404 unit ramp sequence 23, 24
symmetric impulse unit step sequence 23, 24, 26
response 101, 102, 103 unit-pulse response 36, 71
unobservable 68, 82, 84
T
The transposed direct-form II V
structure 39, 387, 390, 409 video signal 2, 4
time domain 46, 144, 173, 211
time series 3 W
time-invariance 23, 35 weighted least-squares
Toeplitz matrix 330 design 157, 172
transfer function 37, 43, 70, 136 weighted pole and zero sensitivity
transposed direct-form II 383 measure 299, 307, 309, 323
transposed direct-form II window function 108, 110,
structure 39, 393, 397, 409 113, 133
transposed filter structure 39
transposed form 39, 41 Z
transposition 39 zero sensitivity
trigonometric component matrix 301
218, 241 zero sensitivity
truncation 110, 257, 268, 328 measure 299, 307, 318, 323
two multiplier lattice zero-input
two-pair 46, 47 overflow 259, 260
two’s complement 254, 255, zero-phase frequency
257, 270 response 218, 222, 223, 241
About the Authors
Takao Hinamoto received his B.E. degree from the Okayama University,
Okayama, Japan, in 1969, and his M.E. degree from the Kobe University,
Kobe, Japan, in 1971, and a Doctorate in Engineering from the Osaka
University, Osaka, Japan, in 1977.
From the year 1972 to 1988, he was with the Faculty of Engineering,
Kobe University and from 1979 to 1981, he was a visiting member of staff in
the Department of Electrical Engineering, Queen’s University, Kingston, ON,
Canada. During 1988–1991, he was the Professor of Electronic Circuits in the
Faculty of Engineering, Tottori University, Tottori, Japan. During 1992–2009,
he was the Professor of Electronic Control in the Department of Electrical
Engineering, Hiroshima University, Hiroshima, Japan. Since 2009, he has
been the Professor Emeritus of Hiroshima University. His research interests
include digital signal processing, system theory, and control engineering.
He has published almost 450 papers in these areas. He is the coeditor and
coauthor of Two-Dimensional Signal and Image Processing (Tokyo, Japan:
SICE, 1996).
He was the Guest Editor of the special sections on Digital Signal Pro-
cessing as well as Adaptive Signal Processing and Its Applications in the
IEICE Transactions on Fundamentals in August 1998 and March 2005,
respectively. He was the Co-Guest Editor of the special section on Recent
Advances in Circuits and Systems in the July and August 2005 issues of
the IEICE Transactions on Information and Systems. In 1997, he was the
Chair of the 12th DSP Symposium held in Hiroshima, Japan, sponsored by
the DSP Technical Committee of IEICE. From 1993 to 2000, he was a Senator
or Member of the Board of Directors in the Society of Instrument and Control
Engineers (SICE), and from 1999 to 2001 he was Chair of the Chugoku
Chapter of SICE. From 2003 to 2004, he served as Chair of the DSP Technical
Committee of IEICE and Chair of the Chugoku Chapter of IEICE.
From 1993 to 1995, he served as an Associate Editor of the IEEE Transac-
tions on Circuits and Systems II. During 2002–2003 and 2006–2007, he served
as an Associate Editor of the IEEE Transactions on Circuits and Systems I.
453
454 About the Authors
In 2004, he served as the General Chair of the 47th IEEE International Midwest
Symposium on Circuits and Systems held in Hiroshima, Japan. Since 1995,
he has been a Steering Committee Member of the IEEE International Midwest
Symposium on Circuits and Systems. Since 1998, he has been a Digital Signal
Processing Technical Committee Member in the IEEE Circuits and Systems
Society. He played a leading role in establishing the Hiroshima Section of
IEEE and served as Interim Chair of the Section.
He received the IEEE Third Millennium Medal in January 2000. He was
elected a Fellow of the IEEE in 2001. In 2004, he was elected a Fellow of the
Institute of Electronics, Information and Communication Engineers (IEICE).
In 2005, he was elected a Fellow of the SICE. He became a Life Fellow of the
IEEE in 2011.
Wu-Sheng Lu received his undergraduate education in mathematics from

the Fudan University, Shanghai, China, during the years 1959 to 1964, and
an M.S. degree in electrical engineering, and a Ph.D. in control science
from the University of Minnesota, Minneapolis, USA, in 1983 and 1984,
respectively.
He was a post-doctoral fellow at the University of Victoria, Victoria,
B.C., Canada, in 1985, and a visiting assistant professor at the University
of Minnesota from January 1986 to April 1987. He joined the Electrical and
Computer Engineering Department, University of Victoria, in 1987 where
he is a professor. His current research interests include analysis and design of
digital filters, digital signal and image processing with a focus on sparse signal
processing, and methods and applications of convex optimization. He is the
co-author with A. Antoniou of Two-Dimensional Digital Filters (Marcel
Dekker, 1992) and Practical Optimization: Algorithms and Engineering
Applications (Springer, 2007).
He served as editor for the Canadian Journal of Electrical and Com-
puter Engineering and associate editor for several journals including IEEE
Transactions on Circuits and Systems I, IEEE Transactions on Circuits and
Systems II, International Journal of Multidimensional Systems and Signal
Processing, and Journal of Circuits, Systems, and Signal Processing.
He received several awards for his teaching at University of Victoria.
He was elected a Fellow of the Engineering Institute of Canada in 1994.
In 1999, he was elected a Fellow of the IEEE. He became a Life Fellow of the
IEEE in 2012. He is a registered professional engineer in British Columbia,
Canada.
River Publishers Series in Signal, Image and Speech Processing


Digital Filter Design and Realization is written to present an up-to-date Takao Hinamoto and Wu-Sheng Lu
and comprehensive account of the analysis, design, and realization of digital
filters. It is intended to be used as a text for graduate students as well as
a reference book for practitioners in the field. Prerequisites for this book
include basic knowledge of calculus, linear algebra, signal analysis, and
linear system theory.
Technical topics discussed in the book include:
• Discrete-Time Systems and z-Transformation
• Stability and Coefficient Sensitivity
• State-Space Models
• FIR Digital Filter Design

• Frequency-Domain Digital Filter Design
• Time-Domain Digital Filter Design
• Interpolated and Frequency-Response-Masking FIR Digital Filter Design
• Composite Digital Filter Design
• Finite Word Length Effects
• Coefficient Sensitivity Analysis and Minimization
• Error Spectrum Shaping
• Roundoff Noise Analysis and Minimization
• Generalized Transposed Direct-Form II
• Block-State Realization

Digital Filter Design and Realization-Wu-Sheng Lu - (2017, River Publishers)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Digital Filter Design and Realization-Wu-Sheng Lu - (2017, River Publishers)

Uploaded by

Copyright:

Available Formats

River Publishers Series in Signal, Image and Speech Processing

Digital Filter Design and Realization

Takao Hinamoto and Wu-Sheng Lu

Takao Hinamoto and Wu-Sheng Lu

• Signal Processing Systems

For a list of other books in this series, visit www.riverpublishers.com

ISBN: 978-87-93519-64-0 (Hardback)

©2017 River Publishers

All rights reserved. No part of this publication may be reproduced, stored in

List of Figures xix

List of Tables xxv

List of Abbreviations xxvii

2 Discrete-Time Systems and z-Transformation 23

3 Stability and Coefficient Sensitivity 57

5 FIR Digital Filter Design 97

5.7.1 The Parks-McClellan Algorithm . . . . . . . . . . . 120

6 Design Methods Using Analog Filter Theory 135

7 Design Methods in the Frequency Domain 151

7.2.2 An Equiripple Design by Linear Programming . . . 155

8 Design Methods in the Time Domain 173

9 Design of Interpolated and FRM FIR Digital Filters 213

9.2.1 Interpolated FIR Filters . . . . . . . . . . . . . . . . 213

10 Design of a Class of Composite Digital Filters 239

11 Finite Word Length Effects 253

11.4 Limit Cycles—Overflow Oscillations . . . . . . . . . . . . 257

12 l2 -Sensitivity Analysis and Minimization 273

13 Pole and Zero Sensitivity Analysis and Minimization 299

13.3.1 Weighted Pole and Zero Sensitivity Minimization

14 Error Spectrum Shaping 327

14.3.1 N th-Order Optimal Error Feedback . . . . . . . . . 338

15 Roundoff Noise Analysis and Minimization 357

16 Generalized Transposed Direct-Form II Realization 383

16.3 Equivalent State-Space Realization . . . . . . . . . . . . . . 388

17 Block-State Realization of IIR Digital Filters 411

17.4.3 l2 -Sensitivity Minimization Without Imposing

About the Authors 453

Analysis, design, and realization of digital filters have experienced major

Chapter 14 studies error spectrum shaping in the recursive digital filters

Figure 2.14 Another cascaded lattice realization of an nth-order all-pass

Figure 5.15 The magnitude response of the resulting filter. . . . . . . . 129

Figure 9.4 A single-stage FRM filter. . . . . . . . . . . . . . . . . . . 216

Table 1.1 Laplace transform pairs . . . . . . . . . . . . . . . . . . . 9

1.2 Terminology for Signal Analysis and Typical Signals

signal, which is defined at discrete instants of time. A continuous-time signal

1.2.2 Examples of Typical Signals

1.3 Digital Signal Processing

Analog Digital Analog Analog

converted into a staircase-type analog signal by a digital-to-analog (D/A)

1.3.2 Advantages of Digital Signal Processing

1.3.3 Disadvantages of Digital Signal Processing

1.4 Analysis of Analog Signals

Equation (1.1) is called the sine-cosine-form of the Fourier series. Using

an cos nΩ0 t + bn sin nΩ0 t = cn ejnΩ0 t + c−n e−jnΩ0 t (1.2)

1.4.2 The Fourier Transform

From (1.6) and (1.7), it follows that

This is called Parseval’s theorem that relates the energy contained in a

1.4.3 The Laplace Transform

lim f (t) = lim sF (s) (1.18)

f (0) = lim f (t) = lim sF (s) (1.19)

Some common Laplace transform pairs can be found in Table 1.1.

Table 1.1 Laplace transform pairs

Unit impulse δ(t) 1

1.5 Analysis of Discrete-Time Signals