DESIGN AND TEST ARCHITECTURE OF INTEGRATED-POWER-METER DIGITAL PART
Borisav Jovanović, Milunka Damnjanović, Predrag Petković, Faculty of Electronic Engineering Niš

Abstract – The digital part of Power Meter IC is considered in this paper. Integrated Power Meter is energy measurement SoC that measures many important power line signal parameters including current and voltage signal RMS, active, reactive power, frequency and energy values. It is considered here from the architectural and clocking point of view.

1. INTRODUCTION

Complete hardware and software multiprocessor systems are nowadays integrated into a single silicon circuit called System-on-Chip [1]. Despite of many problems attached to these complex designs, there are very important advantages like high speed, low cost, low power-consumption, high reliability, etc, pushing the system designers to this design approach.

As can be seen in Fig. 2, the implementation of Digital Part (DP) of IPM has five hard blocks and one large group of scattered cells at the top of layout. Digital filters are divided into hard blocks: SinC filters block for current signal processing; FIR filters for current signal processing, Voltage SinC filters, Voltage FIR Filters and Hilbert filter. Standard cells from DSP and CSP part are placed at the top of layout.

The clock divider circuit, a part of DSP block, takes 4.194 MHz clock signal on its input, and provides clock signals for the other digital parts with frequencies: 524288 Hz, 65536 Hz, 16384 Hz, 8192 Hz and 4096 Hz

System was verified through the VHDL simulations, synthesized by Cadence tools and tested after the fabrication.

Hereafter, the DP of the chip is considered starting with its digital blocks description. Next, its interface and clock tree generation are described. The testing environment is depicted, also.

2. DP BLOCKS

Here, previously mentioned most important digital blocks (Fig.1) are described in short.

The Digital filters decimate AD converters output signals, forming two separate decimation channels with total decimation factor of 128 for both voltage and current signal processing. Each decimation channel (Fig.3) consists of two SinC filters and two FIR filters.

As can be seen in Fig. 2, the implementation of Digital Part (DP) of IPM has five hard blocks and one large group of scattered cells at the top of layout. Digital filters are divided into hard blocks: SinC filters block for current signal processing; FIR filters for current signal processing, Voltage SinC filters, Voltage FIR Filters and Hilbert filter. Standard cells from DSP and CSP part are placed at the top of layout.

The clock divider circuit, a part of DSP block, takes 4.194 MHz clock signal on its input, and provides clock signals for the other digital parts with frequencies: 524288 Hz, 65536 Hz, 16384 Hz, 8192 Hz and 4096 Hz

System was verified through the VHDL simulations, synthesized by Cadence tools and tested after the fabrication.

Hereafter, the DP of the chip is considered starting with its digital blocks description. Next, its interface and clock tree generation are described. The testing environment is depicted, also.

2. DP BLOCKS

Here, previously mentioned most important digital blocks (Fig.1) are described in short.

The Digital filters decimate AD converters output signals, forming two separate decimation channels with total decimation factor of 128 for both voltage and current signal processing. Each decimation channel (Fig.3) consists of two SinC filters and two FIR filters.

As can be seen in Fig. 2, the implementation of Digital Part (DP) of IPM has five hard blocks and one large group of scattered cells at the top of layout. Digital filters are divided into hard blocks: SinC filters block for current signal processing; FIR filters for current signal processing, Voltage SinC filters, Voltage FIR Filters and Hilbert filter. Standard cells from DSP and CSP part are placed at the top of layout.

The clock divider circuit, a part of DSP block, takes 4.194 MHz clock signal on its input, and provides clock signals for the other digital parts with frequencies: 524288 Hz, 65536 Hz, 16384 Hz, 8192 Hz and 4096 Hz

System was verified through the VHDL simulations, synthesized by Cadence tools and tested after the fabrication.

Hereafter, the DP of the chip is considered starting with its digital blocks description. Next, its interface and clock tree generation are described. The testing environment is depicted, also.

2. DP BLOCKS

Here, previously mentioned most important digital blocks (Fig.1) are described in short.

The Digital filters decimate AD converters output signals, forming two separate decimation channels with total decimation factor of 128 for both voltage and current signal processing. Each decimation channel (Fig.3) consists of two SinC filters and two FIR filters.
The FIR filters design is based on a CSD (Canonical Signed Digit) representation, and they have a hardwired implementation of their coefficients. The filters internal structure is shown in Fig. 5.

![FIR filter’s architecture](image)

The first FIR filter in decimation channel (see Fig.3) consists of data processing blocks with two different clock signals with frequencies of 16384 Hz and 8192 Hz. The second FIR filter uses clock signals of 8192 Hz and 4096 Hz. Detailed description of FIR filters can be found in [3].

Hilbert transformer filter is also a separate block on chip. It uses clock signal with frequency of 4096 Hz.

Since the digital filters of IPM IC are divided into hard blocks and four hard blocks use three different clock signals, clock tree generation was a challenging task. The clock tree generation will be described in detail in later sections.

Digital filters produce 24-bit signed two's complement digital samples of current I, voltage V and phase shifted voltage Vp that enter the DSP part (Fig. 1). Data rate of these voltage and current samples is exactly 4096 Hz.

DSP block (Fig.1) operates on 4.194 MHz clock frequency and calculates current and voltage RMS values, apparent S, active P and reactive Q power, power factor Cos(\(\phi\)), frequency F, reactive Eq and active Ea energy.

![DSP block structure](image)

DSP consists of several sub-blocks (Fig.6): Frequency measurement [4], RAM memory block, Multiplication, filtering and accumulation block, Current and voltage RMS calculation [2], Active, reactive and apparent power and power factor calculation [7]; and Control unit [5] that manages all other parts of DSP. There is a single 24-bit data bus connecting these sub-blocks of DSP to memory block. DSP performs over 16000 multiplication operations and over 50000 addition operations in a second. Arithmetical units for multiplication, division and square-rooting use gated clock architecture reducing the power consumption, and therefore, DSP is optimized for low power design. Detailed description of DSP performances and realization can be found in [8].

Communication between IC core and external microprocessor is performed through the Communication Serial Port block (CSP in Fig. 1) that allows the user to calibrate components of the power meter and read the measured results.

3. DP INTERFACE

The ports of the DP interface belong to several groups: clock signal input pin, interface with on-chip Sigma-Delta AD converters, pins for DP testing, CSP interface pins and external control pins for chip operation mode selecting.

Clock signal input of 4.194 MHz frequency comes from analog standard cell.

Data feeding the digital filters can come from one of two different sources. The first is from AD converter outputs. The other is from external pins that enable DP to be tested independently from AD converter outputs.

Four pins form the interface with on-chip AD converters: AD_U, AD_H1, AD_H2 and F_ANALOG. The first three are the sigma delta AD converter outputs approaching the inputs of DP. AD_U is the 2nd order AD converter output from analog voltage channel block. AD_H1 and AD_H2 are outputs from 3rd order sigma delta AD converter of current channel. The fourth port, F_ANALOG, is the clock signal with frequency of 524288 Hz that DP provides for AD converters. It is generated in the clock divider circuit within DSP block.

Four external pins make the interface for DP testing: U, H1, H2 and CLK8. U pin substitutes Voltage channel Sigma delta converter output. H1 and H2 pins substitute the current channel sigma delta outputs. Data rate of signals entering the IC on these three pins is exactly 524288Hz. The fourth pin, CLK8, is the clock signal of the same frequency that clock divider circuit provides to the external data source. CLK8 is produced within the same clock divider circuit like F_ANALOG.

![Test setup for DP testing](image)

The use of testing pins can be explained in shortly as follows. The setup for DP testing is shown in Fig. 7. It consists of 512kB EEPROM memory, binary counters and voltage level translation circuits.

Signal CLK8 is used to increment the counters whose outputs give the address for EEPROM. The ideal AD
converters output values, stored into 512kB EEPROM memory, are periodical with the frequency of 50Hz. After every second of testing procedure, the whole content of EEPROM is read and loaded into the chip. U, H1 and H2 pins are fed from the three of eight EEPROM outputs. Since EEPROM and binary counters are working with the supply voltage of 5V and chip with 3.3V, voltage level shifters are used for signal shifting from 3.3V to 5V and vice versa.

Bidirectional pins SDA and SCL are used as communication interface. One additional pin CSN enables the CSP. More details about CSP realization can be found in [9].

The group of pins controlling the chip operation consists of: ADEN, MODE1, MODE0, and RUNSTOP. Pin ADEN determines if the digital filters are loaded from AD converters outputs or from testing pins U, H1 and H2.

MODE1 and MODE0 are used for selecting one of four chip operating modes [5]: normal operating mode (NOM), testing (TM), initialization (INI) and reset (RST). RUNSTOP pin is additional pin used only in testing mode.

Programmable energy-to-frequency converter within DSP part provides pulse response to consumed power over pin EOUT [7]. Therefore, chip allows interfacing with an external counter. By default, one pulse is generated on EOUT pin when active energy of 1 Whr (Watt-hour) is measured. This pulse-frequency to energy ratio can be changed by loading the appropriate value into special pulse-frequency to energy register within DSP part.

There are two additional pins for testing purposes [6] - 1SEC and ERROR pins. One pulse after every time interval of one second arrives on 1SEC. It is signal reference for the other signal on ERROR pin that indicates the correctness of chip functioning. One can find the possible errors inside the parts of DSP block by comparing the signal on ERROR pin with signal on 1SEC.

4. CLOCK TREE GENERATION

The main job of clock routing design is to control the clock skew from the clock pad to all clocked elements. The main obstacle to clock distribution is capacitance, with resistance playing secondary but important role.

There are two complementary ways to improve clock distribution:

*Physical wire design.* The layout can be designed to make clock delays more even, or at least more predictable.

*Circuit architecture design.* The circuit driving the clock distribution network can be designed to minimize delays using the several stages of drivers.

In general, both techniques must be used to distribute the clock signals with adequate characteristics.

Two most common styles of physical clocking networks are the H tree and the balanced tree. The H tree is very regular structure, which allows predictable delay. The balanced tree style, which was applied in this design, takes the opposite approach of synthesizing the layout based on the characteristics of the circuit to be clocked.

Digital blocks are synthesized in separate synthesis processes by Build Gates synthesis tool. During each synthesis process the memory elements (from one block) are clustered into groups. Then, the clustering is used to guide cell placement process. Like the synthesis process, placement, clock tree generation and routing processes are also done in separate processes for different blocks. The clock trees within each digital block are generated based on the skew information gathered during cell placement process. The obtained clock trees are irregular in shape but have been balanced during design to minimize the skew.

The most common style of clock distribution is using the clock hierarchical driver tree consisting of driver stages (buffers or inverters standard cells). The clock driver tree has two primary goals:

1. To reduce the total delay from the design clock input signal through the clock buffer tree to clock input of all flip-flops.

2. To minimize the clock skew among flip-flops driven by the same clock signal source.

During the clock tree generation, care had to be taken to insert the equal number of driver stages into clock tree between clock tree source and any memory element. Also, since some of drivers are inverting, care had to be taken to use even number of inverters to avoid delivering an inverted clock signal to the memory elements.

Else, generating of three clock trees within hard blocks causes additional difficulties. Clock distribution is often only considered in the last phases of the design process, when most of the chip is already frozen. With not careful clock tree planning, delay problems might explode at the block levels, either due to unanticipated long wires or very long chains of logic that were not recognized earlier.

It is essential to consider clock distribution in the earlier phases of the design. Multiple clock trees inside the blocks have to be specially designed to avoid setup and hold timing problems independently of the number of buffers inserted in the tree and data propagation delays in combinatorial circuits.

To avoid setup and hold timing violations in data propagation between registers with different clocking frequency, it is obvious that rising edges of the clock signals with different frequency should not appear at the same time. One simple but efficient solution, independent from the number of inserted buffers in the clock trees, is phase delay between clock signals. For example, it was said earlier that FIR filter blocks use clock signals with frequencies of 16384Hz, 8192Hz and 4096Hz. Clock divider circuit was built to meet the condition that the rising edge of signal with lower frequency appears on falling edge of the higher frequency signal (Fig.8).

![Figure 8: Clock signals used in FIR filter blocks](Image)

The same rule stands for Sinc filter blocks that use the clocks with frequencies 524288Hz, 65536Hz and 16384Hz.
Even though the complete system is designed in a synchronous fashion, it must still communicate with the outside world which is generally asynchronous. An asynchronous input can change value at any time related to the clock edges of the synchronous system. Therefore, the asynchronous signal must be resolved to be either in the high or low state before it is fed into the synchronous environment. The circuit implementing the synchronization function is a simple D latch.

SDA and SCL, CSP interface pins, are asynchronous. Since, these signals drive the clock inputs of some memory elements within the CSP, in order to eliminate setup and hold violations, it was necessary to use D latches for the synchronization (Fig.9).

**Figure 9: Synchronization of SDA and SCL signals**

### 5. DP VERIFICATION

The main DPs of IC (filters, DSP and CSP) were first described in VHDL and separately simulated. After, parts of digital design were gathered into total digital circuit and simulated again. During writing the test bench for the DP, several tasks had to be accomplished. Since the test bench had to communicate with DP only over it's two pins (SDA and SCL), two-wire serial protocol had to be described.

AD converter outputs (that comes into DP over AD_U, AD_H1 and AD_H2 input ports) had to be generated in MATLAB and stored into appropriate textual files. During DP simulation, AD converter outputs were being read and put on AD_U, AD_H1 and AD_H2 input pins with 524288 Hz data rate.

Since DSP calculates new values (current and voltage RMS values, apparent S, active P and reactive Q power, power factor Cos(φ), frequency F, reactive Eq and active Ea energy) every second of simulator time, output data could not be observed in standard Waveforms. Instead, after every second of simulator time the results are read over CSP interface pins, transformed into appropriate digital word and stored into textual files.

Simulations in NCsim VHDL simulator were performed for three times using the same test bench: before synthesis process, after synthesis process and after clock tree generation. The obtained results were the same, proving the correctness of the design.

Integrated Power Meter IC is implemented in 0.35µm CMOS standard cell technology. After synthesis process in Build Gates, estimated DP area expressed in logical NAND-gate units, was 55230 units.

Finally, Silicon Ensemble was used for floorplanning, placement and routing, as well as clock and reset trees generation for complete circuit.

### 6. CONCLUSION

The digital part of an Integrated Power Meter IC is considered through the its digital blocks description. After that, its interface and clock tree generation were described. System was verified through the VHDL simulations and synthesized by Cadence tools.

The DP has minimal number of additional pins used only for testing. Because the layout of DP is divided into several hard blocks and each of these blocks has multiple clock signals, clock distribution was considered in the earlier phases of the design process. Setup and hold timing problems in data propagation between memory elements with different clock signals are avoided independently of the number of driver stages inserted in the clock trees and data propagation delays in combinatorial circuits.

### REFERENCES


**Sadržaj** – U ovom radu opisan je digitalni deo Integrisanih merača potrošnje električne energije, koji meri nekoliko važnih parametara signala energetske mreže: efektivne vrednosti napona i struje, aktivnu, reaktivnu i prividnu snagu, frekvenciju i energiju. Realizacija digitalnog dela čipa sastoji se iz više blokova koji koriste više taktnih signala različitih frekvencija. U radu su prikazani osnovni blokovi, trasiranje veze takta do njih i način njihovog testiranja.

**ARHITEKTURA PROJEKTA I TESTA DIGITALNOG DELA INTEGRISANOG MERAČA POTROŠNJE ELEKTRIČNE ENERGIJE**

Borisav Jovanović, Milunka Damnjanović, Predrag Petković