Timing Closure of Memory Partitions for a Lower Nodes Technologies

Piyush Bhatasana

Abstract—Metal interconnects are used to make the interconnections between different parts of the circuitry to realize any System on Chip (SoC) design. For the advanced process technologies, the metal interconnects affects the performance of the design. For nanometer process technologies, the coupling effect in the interconnect causes crosstalk and noise. These noise and crosstalk must be affect the operating speed of the design. Thus, the physical design and verification of the advanced process technologies should be include the effects of noise and crosstalk. If the timing of a design is not verified, then the design may not perform at the desired operating speed. The power and area are the other factors, that also to be consider with timing for a faster design. There will always be a trade-off between these three factors. Static Timing Analysis (STA) is one of the many techniques used by the designers to verify the timing of the design and also for closing the design with respect to timing, which is called as timing closure.

Keywords: Static Timing Analysis (STA), Noise and crosstalk, Setup and Hold check, Signoff, Engineering Change Order (ECO), DTA

I. INTRODUCTION

Very-large-scale integration (VLSI) is the process in which the creating an integrated circuit (IC) by combining hundreds of thousands of transistors or devices into a single chip. With the advancement in VLSI technology, there is a constant reduction in the feature size of VLSI devices (i.e. the minimum transistor size) [1-2]. The feature size decreased from about 0.25 m in 1997 to about 10 nm today [3]. Such a continual miniaturization of devices has had a strong impact on VLSI technology in several ways. One of those impacts is the timing analysis. As increase in the number of transistors per chip increases the rated speed or frequency at which a design is to be met and the number of timing paths rises exponentially high due to multiple connections in the design [4]. The continuous decrease in feature size and corresponding increase in chip density and operating frequency have made exhaustive timing analysis a major concern in VLSI design. Hence, Static Timing Analysis has branched out as an entirely separate domain of expertise in itself in modern day System on Chip Design (SoC) [5-6].

II. TIMING CLOSURE IMPLEMENTATION

The implementation is carried out in two stages, post IC Compiler (ICC) and ICC. In post ICC the design will be in are fixed in an incremental way. Synopsis Prime Time tool is used in this stage to analyse the timing paths and source the ECOs [2-3]. The timing path can be implemented in four different ways shown in Fig. (1). The name of the path are as below.
1) Flop to Flop Paths
2) Input to Flops Paths
3) Flop to Output Paths
4) Input to Output Paths

Fig. 1. Four types of Timing Paths

III. CLOCK TREE SYNTHESIS

Once the clock trees are built, it undergoes several buffer insertions and subsequent skew optimizations. An important point to be noted in the case of CTS is the use of clock buffers in clock path instead of normal buffers. Minimum pulse width is an important Design Rule Check (DRC) check performed on any design. It is required to maintain a minimum pulse width for proper functioning of the entire circuit. For the clock tree synthesis the following optimization is be done for the tree synthesis
1) Skew Optimization: The propagation delay is controlled by the size and location of the buffers.
2) Buffer sizing: A binary search algorithm is implemented to find the appropriate size of clock buffers through an iterative process since it affects downstream optimizations
3) Wire sizing: The size of wire is an important factor as it affects the power consumption and is prone to manufacturing issues.
4) Crosstalk and Noise: Any undesired or unintentional effects affecting the nominal operation of the chip is called noise. Crosstalk noise can be referred as unintentional coupling of activity between two or more signals. It is mainly caused by the capacitive coupling between two neighboring signals.
Timing Closure of Memory Partitions for a Lower Nodes Technologies

In nanometer devices, this noise can influence the functionality of the device or even the timing of the device. There are several causes for this noise in the chip. The number of metal layers in the chip nowadays have been increased drastically, because of the technology shrinking. In addition, the wires that are used nowadays are thin and tall rather than wide and tall because of which capacitance will be more between two neighboring wires. The standard cell count is also increased in latest technologies that increases the congestion and causes lot more interactions. Higher frequency designs have faster edge rates, which will cause more current spikes and greater coupling impact. Noise margin for the designs is little because of low supply voltage shown in Fig. 2.

IV. PHYSICAL DESIGN

A. Partitioning

The process of splitting up the entire circuit or system into smaller subsystems or modules is referred to as partitioning. Floor planning the process of arranging the location of hard IPs or macros, the external ports and sub circuits or modules is referred to as Floor planning. Placement the process of finding and determining the spatial location of each cell within a particular block is referred to as Placement. Clock Tree Synthesis The process determining the buffering, gating and routing of the clock signal to meet the required clock skew and clock delay estimates is referred to as CTS. Routing the process of allocating resources for interconnections between different metal layers and routing tracks in channels is referred to as Routing. Timing Closure The optimization of performance of the SoC is carried out by specialized placement and routing techniques and implemented ECOs is collectively referred to as timing closure.

B. Inputs Required for Physical Design

The Prime Time Hyper-Scale hierarchical analysis method analyzes the block-level and top-level portions of the design using separate runs and accurately handles the timing interfaces across hierarchical boundaries. The discrepancies seen in the correlation are removed using context. Context is based on a Synopsys technology called HYPERSCALE. The context information is provided by the full chip level team. Updating the design in full chip level takes more time than updating the design in block level. If the violation is made same in block level and full chip level, then a fix in block level, will itself fix the violation in full chip level. By this, the ECO can be verified quickly and ultimately the design will be closed quickly achieving the Turn Around Time (TAT). Creating a session requires certain input. Since it is context-based session, the context information is the first and foremost input required. The HYPERSCALE variable is enabled for this process. In the working area, netlist, sif, upf and other timing information are saved. Session creation command is run finally.

C. Stages in synthesis

1. Import Design
2. UPF
3. Uniquify
4. Constraints
5. Compile
6. Insert DFT
7. Ungroup
8. Re-timing
9. Syn final

V. TIMING CLOSURE METHODOLOGY

Timing closure is done by analyzing the design with respect to the timing aspects of that design and it involves fixing all kind of timing violations. For analyzing the design STA is used.

There are two stages in the design flow in which the methodologies for timing closure are applied. Post-ICC/Signoff stage and Pre-signoff/ICC stage.

A. SETUP HOLD VIOLATION FIX

- Upsizing the standard cells (increase the drive strength) in data path.
- Pull the launch clock.
- Push the capture clock.
- Removing buffers from data path (hold margin should be checked).
- VT swap Replacing high VT cells with low VT cells.
- Replacing buffers with two inverters placing farther apart so that delay can adjust Methods to fix hold violation are given as,
  - Downsizing the cells (decrease the drive strength) in data path.
  - Pulling the capture clock.
  - Pushing the launch clock.
  - Inserting buffers/Inverter pairs/delay cells to the data path.
  - By increasing the wire load model, we can also fix the hold violation.

VI. UNFIXABLE VIOLATIONS

A: There are available library cells outside the area limit.
B: Delay improvement is too small to fix the violation.
C: The violation is in clock network.
I: Buffer insertion with the given library cells cannot fix the violation.
S: Cell sizing with alternative library cells cannot fix the violation.
T: Timing margin is too tight to fix the violation.
U: UF restricts fixing the violation.
V: Net or Cell is invalid or has do not touch attribute.
W: Fixing the violation might degrade DRC violation.
A. MACRO PLACEMENT

B. IO PLACEMENT

The IO ports are considered pins for the blocks through which it communicates with the external world or other partitions. All the IO ports are placed on the boundary of the design only. IO port optimization is a technique through which port-pin alignment is achieved. This paves way for better timing. In IO port optimization the ports of the design and the pins inside the design are aligned in such a way that there is only a minimum amount of distance between them. Hence, routing can be made efficiently utilizing less amount of metal. The factor that ports will be talking to the neighboring partitions should also be considered during IO port optimization. Also, if the port size is same as those pin size then chances of getting DRC violations during routing is very less shown in Fig. 3.

C. STANDARD CELL PLACEMENT

In modern design, the number of standard cell count in a design has increased drastically. Placing them one by one is a tedious task. Hence, some constraints are given for placing them. A core area is allocated for standard cells placement and all the cells are confined within that area. Also there is one technique called bound creation. Bound is an imaginary area in the design in which a group of cells are confined within. In this way the distance between two cells will be reduced and routing can be done with minimal metal. This ultimately improves the timing. If the placement of a cell is bad then it can be moved manually to a better place.

D. CTS

There are two important steps in the CTS. They are clock tree building and clock tree balancing. Once the CTS is done, then the skew and latency reports should be analyzed shown in Fig. 4. If they are found to be worst then the clock tree has to be rebuilt to make them bettershown in Fig. 5 and 6.

E. CONTEXT BASED SESSION CREATION FOR A BLOCK

- The path is violating in both the levels with a small difference.
- The path is violating in full chip level but meeting in block level with huge margin.
- The path is violating in full chip level but in block level the same path is unconstrained.

F. IO PORT OPTIMIZATION

As a part of the layout information, in addition to boundary of the design the placement information of the IO ports will also be given. They also should be placed on the boundary of the design. These ports act as the gateway between external world and the components inside the chip. Aligning the ports along with the pins of macros or cells will reduce the routing the distance, improve the timing and helps the neighbouring partitions. In figure 5.4, it can be seen that the IO ports are randomly placed with respect to the pins of macro. This will affect the timing of the design very much. For the optimization, firstly the pins that are connected to the ports must be identified. This is done by using all-connected command. This is done in the floorplan stage so that the pin information is obtained easily. get-ports command is used to get all the ports. Then, the location of the pins is identified by getting the bounding-box information of the pins using bbox attribute of the pins in get-attribute command. After this, the ports are moved to the exact location of the pins. From there the ports are moved to the boundary by giving the specific delta value. The delta value is the difference between designs X or Y and macros X or Y shown in Fig. 7 and 8.
not depend on the timing period. The hold violation occurs only after the clock tree is built. Before the clock tree is built all the clocks will be ideal, that is the latency and skew will be zero. Hence, before CTS the setup violations are fixed. The clock pulling and pushing techniques were not implemented in this project, which are to be explored in the future. Also various techniques for optimizing the timing in the pre-signoff stage were discussed. The techniques that are used are partition specific. There can be designs where IO port optimization is not possible at all. Only the shielding technique of crosstalk fixing was discussed. There are other techniques for fixing the crosstalk noise, which can be studied and explored in the future. The other design constraints, power and area should be considered and checked throughout the optimization process. Generally, those factors will also be analyzed and optimized in parallel. With increasing technology, more violations will occur in the future designs and complexity to fix the timing issue will increase. However, STA will hold good for analyzing the design and fixing the violations.

REFERENCES

AUTHORS PROFILE
Piyush Bhatasana has received B. E. degree in Electronics and Communication Engineering in 2002, M. Tech. degree in Electrical Engineering (Microelectronics, VLSI and Display Technology) in 2010, Ph. D. in RF MEMS design in 2019. Presently he is working on RF MEMS shunt switch, RF devices, VLSI design, digital converters, Static Timing Analysis (STA), Noise and crosstalk, Setup and Hold check, Signoff, Engineering Change Order (ECO), DTA of VLSI design.